[jira] [Commented] (SPARK-25119) stages in wrong order within job page DAG chart

2018-08-21 Thread Yunjian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588160#comment-16588160
 ] 

Yunjian Zhang commented on SPARK-25119:
---

create PR as below

[https://github.com/apache/spark/pull/22177]

 

> stages in wrong order within job page DAG chart
> ---
>
> Key: SPARK-25119
> URL: https://issues.apache.org/jira/browse/SPARK-25119
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Yunjian Zhang
>Priority: Minor
> Attachments: Screen Shot 2018-08-14 at 3.35.34 PM.png
>
>
> {color:#33}multiple stages for same job are shown with wrong order in DAG 
> Visualization of job page.{color}
> e.g.
> stage27   stage19 stage20 stage24 stage21



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25119) stages in wrong order within job page DAG chart

2018-08-14 Thread Yunjian Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunjian Zhang updated SPARK-25119:
--
Attachment: Screen Shot 2018-08-14 at 3.35.34 PM.png

> stages in wrong order within job page DAG chart
> ---
>
> Key: SPARK-25119
> URL: https://issues.apache.org/jira/browse/SPARK-25119
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Yunjian Zhang
>Priority: Minor
> Attachments: Screen Shot 2018-08-14 at 3.35.34 PM.png
>
>
> {color:#33}multiple stages for same job are shown with wrong order in DAG 
> Visualization of job page.{color}
> e.g.
> stage27   stage19 stage20 stage24 stage21



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25119) stages in wrong order within job page DAG chart

2018-08-14 Thread Yunjian Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunjian Zhang updated SPARK-25119:
--
Description: 
{color:#33}multiple stages for same job are shown with wrong order in DAG 
Visualization of job page.{color}

e.g.

stage27   stage19 stage20 stage24 stage21

  was:
multiple stages for same job are shown with wrong order in job page.

e.g.

 


> stages in wrong order within job page DAG chart
> ---
>
> Key: SPARK-25119
> URL: https://issues.apache.org/jira/browse/SPARK-25119
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: Yunjian Zhang
>Priority: Minor
> Attachments: Screen Shot 2018-08-14 at 3.35.34 PM.png
>
>
> {color:#33}multiple stages for same job are shown with wrong order in DAG 
> Visualization of job page.{color}
> e.g.
> stage27   stage19 stage20 stage24 stage21



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25119) stages in wrong order within job page DAG chart

2018-08-14 Thread Yunjian Zhang (JIRA)
Yunjian Zhang created SPARK-25119:
-

 Summary: stages in wrong order within job page DAG chart
 Key: SPARK-25119
 URL: https://issues.apache.org/jira/browse/SPARK-25119
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.1
Reporter: Yunjian Zhang


multiple stages for same job are shown with wrong order in job page.

e.g.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-20973) insert table fail caused by unable to fetch data definition file from remote hdfs

2017-06-06 Thread Yunjian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunjian Zhang updated SPARK-20973:
--
Comment: was deleted

(was: I did check the source code and add a patch to fix the insert issue as 
below, unable to attach file here, so just past the content as well.
--
--- 
a/./workspace1/spark-2.1.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
+++ 
b/./workspace/git/gdr/spark/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
@@ -57,7 +57,7 @@ private[hive] class SparkHiveWriterContainer(
   extends Logging
   with HiveInspectors
   with Serializable {
-
+
   private val now = new Date()
   private val tableDesc: TableDesc = fileSinkConf.getTableInfo
   // Add table properties from storage handler to jobConf, so any custom 
storage
@@ -154,6 +154,12 @@ private[hive] class SparkHiveWriterContainer(
 conf.value.setBoolean("mapred.task.is.map", true)
 conf.value.setInt("mapred.task.partition", splitID)
   }
+
+  def newSerializer(tableDesc: TableDesc): Serializer = {
+val serializer = 
tableDesc.getDeserializerClass.newInstance().asInstanceOf[Serializer]
+serializer.initialize(null, tableDesc.getProperties)
+serializer
+  }
 
   def newSerializer(jobConf: JobConf, tableDesc: TableDesc): Serializer = {
 val serializer = 
tableDesc.getDeserializerClass.newInstance().asInstanceOf[Serializer]
@@ -162,10 +168,11 @@ private[hive] class SparkHiveWriterContainer(
   }
 
   protected def prepareForWrite() = {
-val serializer = newSerializer(jobConf, fileSinkConf.getTableInfo)
+val serializer = newSerializer(conf.value, fileSinkConf.getTableInfo)
+logInfo("CHECK table deser:" + 
fileSinkConf.getTableInfo.getDeserializer(conf.value))
 val standardOI = ObjectInspectorUtils
   .getStandardObjectInspector(
-fileSinkConf.getTableInfo.getDeserializer.getObjectInspector,
+
fileSinkConf.getTableInfo.getDeserializer(conf.value).getObjectInspector,
 ObjectInspectorCopyOption.JAVA)
   .asInstanceOf[StructObjectInspector])

> insert table fail caused by unable to fetch data definition file from remote 
> hdfs 
> --
>
> Key: SPARK-20973
> URL: https://issues.apache.org/jira/browse/SPARK-20973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yunjian Zhang
>  Labels: patch
> Attachments: spark-sql-insert.patch
>
>
> I implemented my own hive serde to handle special data files which needs to 
> read data definition during process.
> the process include
> 1.read definition file location from TBLPROPERTIES
> 2.read file content as per step 1
> 3.init serde base on step 2.
> //DDL of the table as below:
> -
> CREATE EXTERNAL TABLE dw_user_stg_txt_out
> ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
> STORED AS
>   INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
>   OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
> LOCATION 'hdfs://${remote_hdfs}/user/data'
> TBLPROPERTIES (
>   'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
> )
> // insert statement
> insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
> //fail with ERROR
> 17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
> dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
> java.lang.RuntimeException: FAILED to get dml file from: 
> hdfs://${remote-hdfs}/dml/user.dml
>   at 
> com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
>   at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20973) insert table fail caused by unable to fetch data definition file from remote hdfs

2017-06-06 Thread Yunjian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunjian Zhang updated SPARK-20973:
--
Attachment: spark-sql-insert.patch

> insert table fail caused by unable to fetch data definition file from remote 
> hdfs 
> --
>
> Key: SPARK-20973
> URL: https://issues.apache.org/jira/browse/SPARK-20973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yunjian Zhang
>  Labels: patch
> Attachments: spark-sql-insert.patch
>
>
> I implemented my own hive serde to handle special data files which needs to 
> read data definition during process.
> the process include
> 1.read definition file location from TBLPROPERTIES
> 2.read file content as per step 1
> 3.init serde base on step 2.
> //DDL of the table as below:
> -
> CREATE EXTERNAL TABLE dw_user_stg_txt_out
> ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
> STORED AS
>   INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
>   OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
> LOCATION 'hdfs://${remote_hdfs}/user/data'
> TBLPROPERTIES (
>   'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
> )
> // insert statement
> insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
> //fail with ERROR
> 17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
> dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
> java.lang.RuntimeException: FAILED to get dml file from: 
> hdfs://${remote-hdfs}/dml/user.dml
>   at 
> com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
>   at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20973) insert table fail caused by unable to fetch data definition file from remote hdfs

2017-06-02 Thread Yunjian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035597#comment-16035597
 ] 

Yunjian Zhang edited comment on SPARK-20973 at 6/2/17 11:06 PM:


I did check the source code and add a patch to fix the insert issue as below, 
unable to attach file here, so just past the content as well.
--
--- 
a/./workspace1/spark-2.1.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
+++ 
b/./workspace/git/gdr/spark/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
@@ -57,7 +57,7 @@ private[hive] class SparkHiveWriterContainer(
   extends Logging
   with HiveInspectors
   with Serializable {
-
+
   private val now = new Date()
   private val tableDesc: TableDesc = fileSinkConf.getTableInfo
   // Add table properties from storage handler to jobConf, so any custom 
storage
@@ -154,6 +154,12 @@ private[hive] class SparkHiveWriterContainer(
 conf.value.setBoolean("mapred.task.is.map", true)
 conf.value.setInt("mapred.task.partition", splitID)
   }
+
+  def newSerializer(tableDesc: TableDesc): Serializer = {
+val serializer = 
tableDesc.getDeserializerClass.newInstance().asInstanceOf[Serializer]
+serializer.initialize(null, tableDesc.getProperties)
+serializer
+  }
 
   def newSerializer(jobConf: JobConf, tableDesc: TableDesc): Serializer = {
 val serializer = 
tableDesc.getDeserializerClass.newInstance().asInstanceOf[Serializer]
@@ -162,10 +168,11 @@ private[hive] class SparkHiveWriterContainer(
   }
 
   protected def prepareForWrite() = {
-val serializer = newSerializer(jobConf, fileSinkConf.getTableInfo)
+val serializer = newSerializer(conf.value, fileSinkConf.getTableInfo)
+logInfo("CHECK table deser:" + 
fileSinkConf.getTableInfo.getDeserializer(conf.value))
 val standardOI = ObjectInspectorUtils
   .getStandardObjectInspector(
-fileSinkConf.getTableInfo.getDeserializer.getObjectInspector,
+
fileSinkConf.getTableInfo.getDeserializer(conf.value).getObjectInspector,
 ObjectInspectorCopyOption.JAVA)
   .asInstanceOf[StructObjectInspector]


was (Author: daniel.yj.zh...@gmail.com):
I did check the source code and add a patch to fix the insert issue

> insert table fail caused by unable to fetch data definition file from remote 
> hdfs 
> --
>
> Key: SPARK-20973
> URL: https://issues.apache.org/jira/browse/SPARK-20973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yunjian Zhang
>  Labels: patch
>
> I implemented my own hive serde to handle special data files which needs to 
> read data definition during process.
> the process include
> 1.read definition file location from TBLPROPERTIES
> 2.read file content as per step 1
> 3.init serde base on step 2.
> //DDL of the table as below:
> -
> CREATE EXTERNAL TABLE dw_user_stg_txt_out
> ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
> STORED AS
>   INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
>   OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
> LOCATION 'hdfs://${remote_hdfs}/user/data'
> TBLPROPERTIES (
>   'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
> )
> // insert statement
> insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
> //fail with ERROR
> 17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
> dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
> java.lang.RuntimeException: FAILED to get dml file from: 
> hdfs://${remote-hdfs}/dml/user.dml
>   at 
> com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
>   at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20973) insert table fail caused by unable to fetch data definition file from remote hdfs

2017-06-02 Thread Yunjian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035597#comment-16035597
 ] 

Yunjian Zhang commented on SPARK-20973:
---

I did check the source code and add a patch to fix the insert issue

> insert table fail caused by unable to fetch data definition file from remote 
> hdfs 
> --
>
> Key: SPARK-20973
> URL: https://issues.apache.org/jira/browse/SPARK-20973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yunjian Zhang
>  Labels: patch
>
> I implemented my own hive serde to handle special data files which needs to 
> read data definition during process.
> the process include
> 1.read definition file location from TBLPROPERTIES
> 2.read file content as per step 1
> 3.init serde base on step 2.
> //DDL of the table as below:
> -
> CREATE EXTERNAL TABLE dw_user_stg_txt_out
> ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
> STORED AS
>   INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
>   OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
> LOCATION 'hdfs://${remote_hdfs}/user/data'
> TBLPROPERTIES (
>   'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
> )
> // insert statement
> insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
> //fail with ERROR
> 17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
> dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
> java.lang.RuntimeException: FAILED to get dml file from: 
> hdfs://${remote-hdfs}/dml/user.dml
>   at 
> com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
>   at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20973) insert table fail caused by unable to fetch data definition file from remote hdfs

2017-06-02 Thread Yunjian Zhang (JIRA)
Yunjian Zhang created SPARK-20973:
-

 Summary: insert table fail caused by unable to fetch data 
definition file from remote hdfs 
 Key: SPARK-20973
 URL: https://issues.apache.org/jira/browse/SPARK-20973
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Yunjian Zhang


I implemented my own hive serde to handle special data files which needs to 
read data definition during process.
the process include
1.read definition file location from TBLPROPERTIES
2.read file content as per step 1
3.init serde base on step 2.
//DDL of the table as below:
-
CREATE EXTERNAL TABLE dw_user_stg_txt_out
ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
STORED AS
  INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
  OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
LOCATION 'hdfs://${remote_hdfs}/user/data'
TBLPROPERTIES (
  'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
)
// insert statement
insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
//fail with ERROR
17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
java.lang.RuntimeException: FAILED to get dml file from: 
hdfs://${remote-hdfs}/dml/user.dml
at 
com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org