[GitHub] spark pull request #21783: [SPARK-24799]A solution of dealing with data skew...

2018-07-16 Thread marymwu
GitHub user marymwu opened a pull request:

https://github.com/apache/spark/pull/21783

[SPARK-24799]A solution of dealing with data skew in left,right,inner join

## What changes were proposed in this pull request?

   For the left,right,inner join statment execution, this solution is 
mainling about to devide the partions where the data skew has occured into 
serveral partions with smaller data scale, in order to parallelly execute more 
tasks to increase effeciency.

## How was this patch tested?
Unit tests in DatasetSuite.scala



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marymwu/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21783


commit 2a01c813b6ef7223a489a4bcda3c9e5feb899060
Author: wangsm9 
Date:   2018-07-16T09:48:44Z

“data skew code for spark2.3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21759: sfas

2018-07-13 Thread marymwu
Github user marymwu closed the pull request at:

https://github.com/apache/spark/pull/21759


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21759: sfas

2018-07-13 Thread marymwu
GitHub user marymwu opened a pull request:

https://github.com/apache/spark/pull/21759

sfas

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marymwu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21759


commit dcf36ad54598118408c1425e81aa6552f42328c8
Author: Dongjoon Hyun 
Date:   2016-05-03T13:02:04Z

[SPARK-15057][GRAPHX] Remove stale TODO comment for making `enum` in 
GraphGenerators

This PR removes a stale TODO comment in `GraphGenerators.scala`

Just comment removed.

Author: Dongjoon Hyun 

Closes #12839 from dongjoon-hyun/SPARK-15057.

(cherry picked from commit 46965cd014fd4ba68bdec15156ec9bcc27d9b217)
Signed-off-by: Reynold Xin 

commit 1dc30f189ac30f070068ca5f60b7b4c85f2adc9e
Author: Bryan Cutler 
Date:   2016-05-19T02:48:36Z

[DOC][MINOR] ml.feature Scala and Python API sync

I reviewed Scala and Python APIs for ml.feature and corrected discrepancies.

Built docs locally, ran style checks

Author: Bryan Cutler 

Closes #13159 from BryanCutler/ml.feature-api-sync.

(cherry picked from commit b1bc5ebdd52ed12aea3fdc7b8f2fa2d00ea09c6b)
Signed-off-by: Reynold Xin 

commit 642f00980f1de13a0f6d1dc8bc7ed5b0547f3a9d
Author: Zheng RuiFeng 
Date:   2016-05-15T14:59:49Z

[MINOR] Fix Typos

1,Rename matrix args in BreezeUtil to upper to match the doc
2,Fix several typos in ML and SQL

manual tests

Author: Zheng RuiFeng 

Closes #13078 from zhengruifeng/fix_ann.

(cherry picked from commit c7efc56c7b6fc99c005b35c335716ff676856c6c)
Signed-off-by: Reynold Xin 

commit 2126fb0c2b2bb8ac4c5338df15182fcf8713fb2f
Author: Sandeep Singh 
Date:   2016-05-19T09:44:26Z

[CORE][MINOR] Remove redundant set master in 
OutputCommitCoordinatorIntegrationSuite

Remove redundant set master in OutputCommitCoordinatorIntegrationSuite, as 
we are already setting it in SparkContext below on line 43.

existing tests

Author: Sandeep Singh 

Closes #13168 from techaddict/minor-1.

(cherry picked from commit 3facca5152e685d9c7da96bff5102169740a4a06)
Signed-off-by: Reynold Xin 

commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317
Author: Andrew Or 
Date:   2016-05-20T05:40:03Z

[HOTFIX] Test compilation error from 52b967f

commit dd0c7fb39cac44e8f0d73f9884fd1582c25e9cf4
Author: Reynold Xin 
Date:   2016-05-20T05:46:08Z

Revert "[HOTFIX] Test compilation error from 52b967f"

This reverts commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317.

commit f8d0177c31d43eab59a7535945f3dfa24e906273
Author: Davies Liu 
Date:   2016-05-18T23:02:52Z

Revert "[SPARK-15392][SQL] fix default value of size estimation of logical 
plan"

This reverts commit fc29b896dae08b957ed15fa681b46162600a4050.

(cherry picked from commit 84b23453ddb0a97e3d81306de0a5dcb64f88bdd0)
Signed-off-by: Reynold Xin 

commit 2ef645724a7f229309a87c5053b0fbdf45d06f52
Author: Takuya UESHIN 
Date:   2016-05-20T05:55:44Z

[SPARK-15313][SQL] EmbedSerializerInFilter rule should keep exprIds of 
output of surrounded SerializeFromObject.

## What changes were proposed in this pull request?

The following code:

```
val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS()
ds.filter(_._1 == "b").select(expr("_1").as[String]).foreach(println(_))
```

throws an Exception:

```
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: _1#420
 at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
 at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
 at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87)

...
 Cause: java.lang.RuntimeException: Couldn't find _1#420 in [_1#416,_2#417]
 at scala.sys.package$.error(package.scala:27)
 at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:94)
 at 
org.apache.

[jira] [Created] (SPARK-24799) A solution of dealing with data skew in left,right,inner join

2018-07-13 Thread marymwu (JIRA)
marymwu created SPARK-24799:
---

 Summary: A solution of dealing with data skew in left,right,inner 
join
 Key: SPARK-24799
 URL: https://issues.apache.org/jira/browse/SPARK-24799
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: marymwu
 Fix For: 2.3.0


For the left,right,inner join statment execution, this solution is mainling 
about to devide the partions where the data skew has occured into serveral 
partions with smaller data scale, in order to parallelly execute more tasks to 
increase effeciency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17181) [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished

2016-08-22 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-17181:

Attachment: job1000-2.png
job1000-1.png

> [Spark2.0 web ui]The status of the certain jobs is still displayed as running 
> even if all the stages of this job have already finished 
> ---
>
> Key: SPARK-17181
> URL: https://issues.apache.org/jira/browse/SPARK-17181
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: job1000-1.png, job1000-2.png
>
>
> [Spark2.0 web ui]The status of the certain jobs is still displayed as running 
> even if all the stages of this job have already finished 
> Note: not sure what kind of jobs will encounter this problem
> The following log shows that job 1000 has already been done, but on spark2.0 
> web ui, the status of job 1000 is still displayed as running, see attached 
> file
> 16/08/22 16:01:29 INFO DAGScheduler: dag send msg, result task done, job: 1000
> 16/08/22 16:01:29 INFO DAGScheduler: Job 1000 finished: run at 
> AccessController.java:-2, took 4.664319 s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17181) [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished

2016-08-22 Thread marymwu (JIRA)
marymwu created SPARK-17181:
---

 Summary: [Spark2.0 web ui]The status of the certain jobs is still 
displayed as running even if all the stages of this job have already finished 
 Key: SPARK-17181
 URL: https://issues.apache.org/jira/browse/SPARK-17181
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.0.0
Reporter: marymwu
Priority: Minor


[Spark2.0 web ui]The status of the certain jobs is still displayed as running 
even if all the stages of this job have already finished 

Note: not sure what kind of jobs will encounter this problem

The following log shows that job 1000 has already been done, but on spark2.0 
web ui, the status of job 1000 is still displayed as running, see attached file
16/08/22 16:01:29 INFO DAGScheduler: dag send msg, result task done, job: 1000
16/08/22 16:01:29 INFO DAGScheduler: Job 1000 finished: run at 
AccessController.java:-2, took 4.664319 s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader

2016-08-22 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430243#comment-15430243
 ] 

marymwu commented on SPARK-5770:


Hey, we have ran into the same issue too. We try to fix this but failed. 
Anybody can help on this issue, thank so much!

> Use addJar() to upload a new jar file to executor, it can't be added to 
> classloader
> ---
>
> Key: SPARK-5770
> URL: https://issues.apache.org/jira/browse/SPARK-5770
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>Priority: Minor
>
> First use addJar() to upload a jar to the executor, then change the jar 
> content and upload it again. We can see the jar file in the local has be 
> updated, but the classloader still load the old one. The executor log has no 
> error or exception to point it.
> I use spark-shell to test it. And set "spark.files.overwrite" is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16970) [spark2.0] spark2.0 doesn't catch the java exception thrown by reflect function in sql statement which causes the job abort

2016-08-09 Thread marymwu (JIRA)
marymwu created SPARK-16970:
---

 Summary: [spark2.0] spark2.0 doesn't catch the java exception 
thrown by reflect function in sql statement which causes the job abort
 Key: SPARK-16970
 URL: https://issues.apache.org/jira/browse/SPARK-16970
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu


[spark2.0] spark2.0 doesn't catch the java exception thrown by reflect function 
in the sql statement which causes the job abort

steps:
1. select reflect('java.net.URLDecoder','decode','%%E7','utf-8') test;
-->"%%" which causes the java exception

error:
16/08/09 15:56:38 INFO DAGScheduler: Job 1 failed: run at 
AccessController.java:-2, took 7.018147 s
16/08/09 15:56:38 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 162 in 
stage 1.0 failed 8 times, most recent failure: Lost task 162.7 in stage 1.0 
(TID 207, slave7.lenovomm2.com): org.apache.spark.SparkException: Task failed 
while writing rows.
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:330)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection.eval(CallMethodViaReflection.scala:87)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:288)
... 8 more
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex 
characters in escape (%) pattern - For input string: "%E"
at java.net.URLDecoder.decode(URLDecoder.java:192)
... 19 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16833) [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark

2016-08-01 Thread marymwu (JIRA)
marymwu created SPARK-16833:
---

 Summary: [Spark2.0]when creating temporary function,command "add 
jar" doesn't work unless restart spark 
 Key: SPARK-16833
 URL: https://issues.apache.org/jira/browse/SPARK-16833
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu


[Spark2.0]when creating temporary function,command "add jar" doesn't work 
unless restart spark 
Steps:
1. add jar /tmp/GeoIP-0.6.8.jar;
2. create temporary function GeoIP2 as 
'com.lenovo.lps.device.hive.udf.UDFGeoIP';
3. select GeoIP2('tdy');
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 527.0 failed 8 times, most recent failure: Lost task 0.7 in stage 
527.0 (TID 140171, smokeslave2.avatar.lenovomm.com): 
java.lang.RuntimeException: Stream '/jars/GeoIP-0.6.8.jar'' was not found.

Note: After restart spark,it works.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports

2016-08-01 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401599#comment-15401599
 ] 

marymwu commented on SPARK-16601:
-

Got it, thanks.BTW, some grammar changes in spark2.0 compared with spark1.6?

> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports
> -
>
> Key: SPARK-16601
> URL: https://issues.apache.org/jira/browse/SPARK-16601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: error log.png
>
>
> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports.
> error log is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16603) Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports

2016-08-01 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401598#comment-15401598
 ] 

marymwu commented on SPARK-16603:
-

ok,got it,thanks

> Spark2.0 fail in executing the sql statement which field name begins with 
> number,like "d.30_day_loss_user" while spark1.6 supports
> --
>
> Key: SPARK-16603
> URL: https://issues.apache.org/jira/browse/SPARK-16603
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
>
> Spark2.0 fail in executing the sql statement which field name begins with 
> number,like "d.30_day_loss_user" while spark1.6 supports
> Error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 
> '.30' expecting
> {')', ','}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

2016-08-01 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401597#comment-15401597
 ] 

marymwu commented on SPARK-16605:
-

I see,thanks!

> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> ---
>
> Key: SPARK-16605
> URL: https://issues.apache.org/jira/browse/SPARK-16605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: screenshot-1.png
>
>
> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> Steps:
> 1. Use hive to create a table "tbtxt" stored as txt and load data into it.
> 2. Use hive to create a table "tborc" stored as orc and insert the data from 
> table "tbtxt" . Example, "create table tborc stored as orc as select * from 
> tbtxt"
> 3. Use spark2.0 to "select * from tborc;".-->error 
> occurs,java.lang.IllegalArgumentException: Field "nid" does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports

2016-07-19 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383625#comment-15383625
 ] 

marymwu commented on SPARK-16601:
-

I'd like to create a table in a named DB

> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports
> -
>
> Key: SPARK-16601
> URL: https://issues.apache.org/jira/browse/SPARK-16601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: error log.png
>
>
> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports.
> error log is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

2016-07-18 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16605:

Attachment: screenshot-1.png

> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> ---
>
> Key: SPARK-16605
> URL: https://issues.apache.org/jira/browse/SPARK-16605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: screenshot-1.png
>
>
> Spark2.0 cannot "select" data from a table stored as an orc file which has 
> been created by hive while hive or spark1.6 supports
> Steps:
> 1. Use hive to create a table "tbtxt" stored as txt and load data into it.
> 2. Use hive to create a table "tborc" stored as orc and insert the data from 
> table "tbtxt" . Example, "create table tborc stored as orc as select * from 
> tbtxt"
> 3. Use spark2.0 to "select * from tborc;".-->error 
> occurs,java.lang.IllegalArgumentException: Field "nid" does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

2016-07-18 Thread marymwu (JIRA)
marymwu created SPARK-16605:
---

 Summary: Spark2.0 cannot "select" data from a table stored as an 
orc file which has been created by hive while hive or spark1.6 supports
 Key: SPARK-16605
 URL: https://issues.apache.org/jira/browse/SPARK-16605
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: marymwu


Spark2.0 cannot "select" data from a table stored as an orc file which has been 
created by hive while hive or spark1.6 supports

Steps:
1. Use hive to create a table "tbtxt" stored as txt and load data into it.
2. Use hive to create a table "tborc" stored as orc and insert the data from 
table "tbtxt" . Example, "create table tborc stored as orc as select * from 
tbtxt"
3. Use spark2.0 to "select * from tborc;".-->error 
occurs,java.lang.IllegalArgumentException: Field "nid" does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16604) Spark2.0 fail in executing the sql statement which includes partition field in the "select" statement while spark1.6 supports

2016-07-18 Thread marymwu (JIRA)
marymwu created SPARK-16604:
---

 Summary: Spark2.0 fail in executing the sql statement which 
includes partition field in the "select" statement while spark1.6 supports
 Key: SPARK-16604
 URL: https://issues.apache.org/jira/browse/SPARK-16604
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: marymwu


Spark2.0 fail in executing the sql statement which includes partition field in 
the "select" statement

error:
16/07/14 16:10:47 INFO HiveThriftServer2: set 
sessionId(69e92ba1-4be2-4be9-bc81-7a00c5802ef8) to 
exeId(c93f69b0-0f6e-4f07-afdc-ca6c41045fa3)
16/07/14 16:10:47 INFO SparkSqlParser: Parsing command: INSERT OVERWRITE TABLE 
d_avatar.RPS__H_REPORT_MORE_DIMENSION_MORE_NORM_FIRST_CHANNEL_VCD_IMPALA 
PARTITION(p_event_date='2016-07-13')
select 
app_key,
app_version,
app_channel,
device_model,
total_num,
new_num,
active_num,
extant_num,
visits_num,
start_num,
p_event_date
from RPS__H_REPORT_MORE_DIMENSION_MORE_NORM_FIRST_CHANNEL_VCD where 
p_event_date = '2016-07-13'
16/07/14 16:10:47 INFO ThriftHttpServlet: Could not validate cookie sent, will 
try to generate a new cookie
16/07/14 16:10:47 INFO ThriftHttpServlet: Cookie added for clientUserName hive
16/07/14 16:10:47 INFO HiveMetaStore: 108: get_table : db=default 
tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd
16/07/14 16:10:47 INFO audit: ugi=u_reaper  ip=unknown-ip-addr  
cmd=get_table : db=default 
tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd 
16/07/14 16:10:47 INFO HiveMetaStore: 108: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore
16/07/14 16:10:47 INFO ObjectStore: ObjectStore, initialize called
16/07/14 16:10:47 INFO ThriftHttpServlet: Could not validate cookie sent, will 
try to generate a new cookie
16/07/14 16:10:47 INFO ThriftHttpServlet: Cookie added for clientUserName hive
16/07/14 16:10:47 INFO Query: Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is 
closing
16/07/14 16:10:47 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
MYSQL
16/07/14 16:10:47 INFO ObjectStore: Initialized ObjectStore
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO HiveMetaStore: 108: get_table : db=d_avatar 
tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd_impala
16/07/14 16:10:47 INFO audit: ugi=u_reaper  ip=unknown-ip-addr  
cmd=get_table : db=d_avatar 
tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd_impala 
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint
16/07/14 16:10:49 WARN HiveSessionState$$anon$1: Max iterations (100) reached 
for batch Resolution
16/07/14 16:10:49 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
org.apache.spark.sql.AnalysisException: unresolved operator 'InsertIntoTable 
MetastoreRelation d_avatar, 
rps__h_report_more_dimension_more_norm_first_channel_vcd_impala, None, 
Map(p_event_date -> Some(2016-07-13)), true, false;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:56)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:309)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:51)
at 
org.apache.spark.sql.

[jira] [Created] (SPARK-16603) Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports

2016-07-18 Thread marymwu (JIRA)
marymwu created SPARK-16603:
---

 Summary: Spark2.0 fail in executing the sql statement which field 
name begins with number,like "d.30_day_loss_user" while spark1.6 supports
 Key: SPARK-16603
 URL: https://issues.apache.org/jira/browse/SPARK-16603
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: marymwu
Priority: Minor


Spark2.0 fail in executing the sql statement which field name begins with 
number,like "d.30_day_loss_user" while spark1.6 supports

Error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 
'.30' expecting
{')', ','}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16602) Spark2.0-error occurs when execute the sql statement which includes "nvl" function while spark1.6 supports

2016-07-18 Thread marymwu (JIRA)
marymwu created SPARK-16602:
---

 Summary: Spark2.0-error occurs when execute the sql statement 
which includes "nvl" function while spark1.6 supports
 Key: SPARK-16602
 URL: https://issues.apache.org/jira/browse/SPARK-16602
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: marymwu


Spark2.0-error occurs when execute the sql statement which includes "nvl" 
function while spark1.6 supports

Error: org.apache.spark.sql.AnalysisException: cannot resolve 
'nvl(b.`new_user`, 0)' due to data type mismatch: input to function coalesce 
should all be the same type, but it's [string, int]; line 2 pos 73 
(state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports

2016-07-18 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16601:

Attachment: error log.png

> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports
> -
>
> Key: SPARK-16601
> URL: https://issues.apache.org/jira/browse/SPARK-16601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: error log.png
>
>
> Spark2.0 fail in creating table using sql statement "create table 
> `db.tableName` xxx" while spark1.6 supports.
> error log is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports

2016-07-18 Thread marymwu (JIRA)
marymwu created SPARK-16601:
---

 Summary: Spark2.0 fail in creating table using sql statement 
"create table `db.tableName` xxx" while spark1.6 supports
 Key: SPARK-16601
 URL: https://issues.apache.org/jira/browse/SPARK-16601
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: marymwu
Priority: Minor


Spark2.0 fail in creating table using sql statement "create table 
`db.tableName` xxx" while spark1.6 supports.

error log is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (HIVE-14160) Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distribution

2016-07-05 Thread marymwu (JIRA)
marymwu created HIVE-14160:
--

 Summary: Reduce-task costs a long time to finish on the condition 
that the certain sql "select a,distinct(b) group by a" has been executed on the 
data which has skew distribution
 Key: HIVE-14160
 URL: https://issues.apache.org/jira/browse/HIVE-14160
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Affects Versions: 1.1.0
Reporter: marymwu


Reduce-task costs a long time to finish on the condition that the certain sql 
"select a,distinct(b) group by a" has been executed on the data which has skew 
distribution

data scale: 64G



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SPARK-16376) [Spark web UI]:HTTP ERROR 500 when using rest api "/applications/[app-id]/jobs" if array "stageIds" is empty

2016-07-05 Thread marymwu (JIRA)
marymwu created SPARK-16376:
---

 Summary: [Spark web UI]:HTTP ERROR 500 when using rest api 
"/applications/[app-id]/jobs" if array "stageIds" is empty
 Key: SPARK-16376
 URL: https://issues.apache.org/jira/browse/SPARK-16376
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.0.0
        Reporter: marymwu


[Spark web UI]:HTTP ERROR 500 when using rest api "/applications/[app-id]/jobs" 
if array "stageIds" is empty

See attachment for reference.

HTTP ERROR 500

Problem accessing /api/v1/applications/application_1466239933301_175531/jobs. 
Reason:

Server Error

Caused by:

java.lang.UnsupportedOperationException: empty.max
at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:216)
at scala.collection.AbstractTraversable.max(Traversable.scala:105)
at 
org.apache.spark.status.api.v1.AllJobsResource$.convertJobData(AllJobsResource.scala:71)
at 
org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2$$anonfun$apply$2.apply(AllJobsResource.scala:46)
at 
org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2$$anonfun$apply$2.apply(AllJobsResource.scala:44)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at 
org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2.apply(AllJobsResource.scala:44)
at 
org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2.apply(AllJobsResource.scala:43)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$flatMap$2.apply(TraversableLike.scala:753)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:752)
at 
org.apache.spark.status.api.v1.AllJobsResource.jobsList(AllJobsResource.scala:43)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:134)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at 
org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at 
org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
at 
org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at 
org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.spark-project.jetty.servlet.ServletHandler.doScope(Servle

[jira] [Updated] (SPARK-16375) [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks

2016-07-05 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16375:

Attachment: numSkippedTasksWrongValue.png

> [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the 
> variable numSkippedTasks
> ---
>
> Key: SPARK-16375
> URL: https://issues.apache.org/jira/browse/SPARK-16375
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: numSkippedTasksWrongValue.png
>
>
> [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the 
> variable numSkippedTasks
> See attachment for reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16375) [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks

2016-07-05 Thread marymwu (JIRA)
marymwu created SPARK-16375:
---

 Summary: [Spark web UI]:The wrong value(numCompletedTasks) has 
been assigned to the variable numSkippedTasks
 Key: SPARK-16375
 URL: https://issues.apache.org/jira/browse/SPARK-16375
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.0.0
Reporter: marymwu


[Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the 
variable numSkippedTasks

See attachment for reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition

2016-07-03 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16089:

Component/s: SQL

> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports
> ---
>
> Key: SPARK-16089
> URL: https://issues.apache.org/jira/browse/SPARK-16089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: StaticPartitionSQLStatementError.png
>
>
> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports.
> Testcase:
> "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION 
> (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where 
> dt = '2016-06-21';"
> Error: org.apache.spark.sql.AnalysisException: unresolved operator 
> 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt 
> -> Some(2016-06-21)), true, false; (state=,code=0)
> see attachment for reference.
> Note:
> The same SQL statement succeeded in Spark 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does

2016-07-03 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16092:

Component/s: SQL

> Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict 
> as a global variable in Spark2.0 configuration file while Spark1.6 does
> 
>
> Key: SPARK-16092
> URL: https://issues.apache.org/jira/browse/SPARK-16092
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict 
> as a global variable in Spark2.0 configuration file while Spark1.6 does
> Precondition:
> set hive.exec.dynamic.partition.mode=nonstrict as a global variable in 
> Spark2.0 configuration file
> "
> hive.exec.dynamic.partition.mode
> nonstrict
> "
> Testcase:
> "insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt)  select 
> t.nid, t.price, t.dt from (select nid, price, dt from 
> d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t 
> group by t.nid, t.price, t.dt;"
> Result:
> Error: org.apache.spark.SparkException: Dynamic partition strict mode 
> requires at least one static partition column. To turn this off set 
> hive.exec.dynamic.partition.mode=nonstrict (state=,code=0)
> Note:
> Spark1.6 supports the above SQL statement after set 
> hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark 
> configuration file
> "
> hive.exec.dynamic.partition.mode
> nonstrict
> "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-07-03 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16093:

Component/s: SQL

> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> --
>
> Key: SPARK-16093
> URL: https://issues.apache.org/jira/browse/SPARK-16093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Errorlog.txt
>
>
> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> Precondition:
> set spark.sql.autoBroadcastJoinThreshold = 1;
> Testcase:
> "INSERT OVERWRITE TABLE 
> RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION 
> (p_event_date='2016-06-18')
> select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
> (select app_key,app_channel,lps_did from 
> RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a 
> join 
> (select app_key,lps_did,device_model, count(1) as visits from 
> RPS__H_REPORT_MORE_DIMENSION_SMALL where  p_event_date = '2016-06-18'
> and ( log_type=1 or  log_type=2)  
> group by  app_key,lps_did,device_model) b  
> on a.lps_did = b.lps_did and a.app_key=b.app_key 
> group by a.app_key,a.app_channel,b.device_model;
> "
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation default, 
> rps__h_report_more_dimension_first_channel_visit_cd_day, None, 
> Map(p_event_date -> Some(2016-06-18)), true, false
> +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Final,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
>+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, 
> app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle 
> partition size: 5])
>   +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,sum#41L])
>  +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
> +- BroadcastHashJoin [lps_did#8,app_key#7], 
> [lps_did#13,app_key#12], Inner, BuildRight, None
>:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
>:  +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], 
> MetastoreRelation default, 
> rps__h_report_more_dimension_earliest_newuser_list_c, None
>+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
> string], input[0, string]))
>   +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Final,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,visits#3L])
>  +- Exchange(coordinator id: 733045095) 
> hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), 
> Some(coordinator[target post-shuffle partition size: 5])
> +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Partial,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,count#39L])
>+- Project [app_key#12,lps_did#13,device_model#20]
>   +- Filter ((isnotnull(app_key#12) && 
> isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || 
> (cast(log_type#11 as int) = 2)))
>  +- HiveTableScan 
> [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation 
> default, rps__h_report_more_dimension_small, None, 
> [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
> Time taken: 4.775 seconds, Fetched 1 row(s)
> 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
> Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], 
> Inner, BuildRight, None
> Result:
> 1. Execution failed, spark service is unavailable.
> 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, 
> BroadcastHashJoin has been used when join two large tables.
> Error log is as attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16255) Spark2.0 doesn't support the following SQL statement:"insert into directory "/u_qa_user/hive_testdata/test1/t1" select * from d_test_tpc_2g_txt.auction" while Hive suppo

2016-06-28 Thread marymwu (JIRA)
marymwu created SPARK-16255:
---

 Summary: Spark2.0 doesn't support the following SQL 
statement:"insert into directory "/u_qa_user/hive_testdata/test1/t1" select * 
from d_test_tpc_2g_txt.auction" while Hive supports
 Key: SPARK-16255
 URL: https://issues.apache.org/jira/browse/SPARK-16255
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
        Reporter: marymwu


Spark2.0 doesn't support the following SQL statement:"insert into directory 
"/u_qa_user/hive_testdata/test1/t1" select * from d_test_tpc_2g_txt.auction" 
while Hive supports



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total

2016-06-28 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16254:

Affects Version/s: 2.0.0

> Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is 
> more than Total 
> --
>
> Key: SPARK-16254
> URL: https://issues.apache.org/jira/browse/SPARK-16254
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: Reference.png
>
>
> Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is 
> more than Total 
> See attachment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total

2016-06-28 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16254:

Attachment: Reference.png

> Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is 
> more than Total 
> --
>
> Key: SPARK-16254
> URL: https://issues.apache.org/jira/browse/SPARK-16254
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: marymwu
>Priority: Minor
> Attachments: Reference.png
>
>
> Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is 
> more than Total 
> See attachment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total

2016-06-28 Thread marymwu (JIRA)
marymwu created SPARK-16254:
---

 Summary: Spark2.0 monitor web ui->Tasks (for all stages)->the 
number of Succeed is more than Total 
 Key: SPARK-16254
 URL: https://issues.apache.org/jira/browse/SPARK-16254
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: marymwu
Priority: Minor


Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more 
than Total 

See attachment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does

2016-06-27 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350764#comment-15350764
 ] 

marymwu commented on SPARK-16092:
-

Any body help?

> Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict 
> as a global variable in Spark2.0 configuration file while Spark1.6 does
> 
>
> Key: SPARK-16092
> URL: https://issues.apache.org/jira/browse/SPARK-16092
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict 
> as a global variable in Spark2.0 configuration file while Spark1.6 does
> Precondition:
> set hive.exec.dynamic.partition.mode=nonstrict as a global variable in 
> Spark2.0 configuration file
> "
> hive.exec.dynamic.partition.mode
> nonstrict
> "
> Testcase:
> "insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt)  select 
> t.nid, t.price, t.dt from (select nid, price, dt from 
> d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t 
> group by t.nid, t.price, t.dt;"
> Result:
> Error: org.apache.spark.SparkException: Dynamic partition strict mode 
> requires at least one static partition column. To turn this off set 
> hive.exec.dynamic.partition.mode=nonstrict (state=,code=0)
> Note:
> Spark1.6 supports the above SQL statement after set 
> hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark 
> configuration file
> "
> hive.exec.dynamic.partition.mode
> nonstrict
> "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partitio

2016-06-27 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350765#comment-15350765
 ] 

marymwu commented on SPARK-16089:
-

Any body help?

> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports
> ---
>
> Key: SPARK-16089
> URL: https://issues.apache.org/jira/browse/SPARK-16089
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: StaticPartitionSQLStatementError.png
>
>
> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports.
> Testcase:
> "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION 
> (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where 
> dt = '2016-06-21';"
> Error: org.apache.spark.sql.AnalysisException: unresolved operator 
> 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt 
> -> Some(2016-06-21)), true, false; (state=,code=0)
> see attachment for reference.
> Note:
> The same SQL statement succeeded in Spark 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-06-27 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350763#comment-15350763
 ] 

marymwu commented on SPARK-16093:
-

Any body help?

> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> --
>
> Key: SPARK-16093
> URL: https://issues.apache.org/jira/browse/SPARK-16093
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: marymwu
> Attachments: Errorlog.txt
>
>
> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> Precondition:
> set spark.sql.autoBroadcastJoinThreshold = 1;
> Testcase:
> "INSERT OVERWRITE TABLE 
> RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION 
> (p_event_date='2016-06-18')
> select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
> (select app_key,app_channel,lps_did from 
> RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a 
> join 
> (select app_key,lps_did,device_model, count(1) as visits from 
> RPS__H_REPORT_MORE_DIMENSION_SMALL where  p_event_date = '2016-06-18'
> and ( log_type=1 or  log_type=2)  
> group by  app_key,lps_did,device_model) b  
> on a.lps_did = b.lps_did and a.app_key=b.app_key 
> group by a.app_key,a.app_channel,b.device_model;
> "
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation default, 
> rps__h_report_more_dimension_first_channel_visit_cd_day, None, 
> Map(p_event_date -> Some(2016-06-18)), true, false
> +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Final,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
>+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, 
> app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle 
> partition size: 5])
>   +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,sum#41L])
>  +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
> +- BroadcastHashJoin [lps_did#8,app_key#7], 
> [lps_did#13,app_key#12], Inner, BuildRight, None
>:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
>:  +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], 
> MetastoreRelation default, 
> rps__h_report_more_dimension_earliest_newuser_list_c, None
>+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
> string], input[0, string]))
>   +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Final,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,visits#3L])
>  +- Exchange(coordinator id: 733045095) 
> hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), 
> Some(coordinator[target post-shuffle partition size: 5])
> +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Partial,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,count#39L])
>+- Project [app_key#12,lps_did#13,device_model#20]
>   +- Filter ((isnotnull(app_key#12) && 
> isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || 
> (cast(log_type#11 as int) = 2)))
>  +- HiveTableScan 
> [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation 
> default, rps__h_report_more_dimension_small, None, 
> [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
> Time taken: 4.775 seconds, Fetched 1 row(s)
> 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
> Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], 
> Inner, BuildRight, None
> Result:
> 1. Execution failed, spark service is unavailable.
> 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, 
> BroadcastHashJoin has been used when join two large tables.
> Error log is as attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-06-21 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16093:

Attachment: Errorlog.txt

> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> --
>
> Key: SPARK-16093
> URL: https://issues.apache.org/jira/browse/SPARK-16093
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: marymwu
> Attachments: Errorlog.txt
>
>
> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> Precondition:
> set spark.sql.autoBroadcastJoinThreshold = 1;
> Testcase:
> "INSERT OVERWRITE TABLE 
> RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION 
> (p_event_date='2016-06-18')
> select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
> (select app_key,app_channel,lps_did from 
> RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a 
> join 
> (select app_key,lps_did,device_model, count(1) as visits from 
> RPS__H_REPORT_MORE_DIMENSION_SMALL where  p_event_date = '2016-06-18'
> and ( log_type=1 or  log_type=2)  
> group by  app_key,lps_did,device_model) b  
> on a.lps_did = b.lps_did and a.app_key=b.app_key 
> group by a.app_key,a.app_channel,b.device_model;
> "
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation default, 
> rps__h_report_more_dimension_first_channel_visit_cd_day, None, 
> Map(p_event_date -> Some(2016-06-18)), true, false
> +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Final,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
>+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, 
> app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle 
> partition size: 5])
>   +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
> functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], 
> output=[app_key#7,app_channel#9,device_model#20,sum#41L])
>  +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
> +- BroadcastHashJoin [lps_did#8,app_key#7], 
> [lps_did#13,app_key#12], Inner, BuildRight, None
>:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
>:  +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], 
> MetastoreRelation default, 
> rps__h_report_more_dimension_earliest_newuser_list_c, None
>+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
> string], input[0, string]))
>   +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Final,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,visits#3L])
>  +- Exchange(coordinator id: 733045095) 
> hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), 
> Some(coordinator[target post-shuffle partition size: 5])
> +- 
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
> functions=[(count(1),mode=Partial,isDistinct=false)], 
> output=[app_key#12,lps_did#13,device_model#20,count#39L])
>+- Project [app_key#12,lps_did#13,device_model#20]
>   +- Filter ((isnotnull(app_key#12) && 
> isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || 
> (cast(log_type#11 as int) = 2)))
>  +- HiveTableScan 
> [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation 
> default, rps__h_report_more_dimension_small, None, 
> [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
> Time taken: 4.775 seconds, Fetched 1 row(s)
> 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
> Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], 
> Inner, BuildRight, None
> Result:
> 1. Execution failed, spark service is unavailable.
> 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, 
> BroadcastHashJoin has been used when join two large tables.
> Error log is as attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-06-21 Thread marymwu (JIRA)
marymwu created SPARK-16093:
---

 Summary: Spark2.0 take no effect after set 
spark.sql.autoBroadcastJoinThreshold = 1
 Key: SPARK-16093
 URL: https://issues.apache.org/jira/browse/SPARK-16093
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu


Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

Precondition:
set spark.sql.autoBroadcastJoinThreshold = 1;

Testcase:
"INSERT OVERWRITE TABLE RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY 
PARTITION (p_event_date='2016-06-18')
select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
(select app_key,app_channel,lps_did from 
RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a 
join 
(select app_key,lps_did,device_model, count(1) as visits from 
RPS__H_REPORT_MORE_DIMENSION_SMALL where  p_event_date = '2016-06-18'
and ( log_type=1 or  log_type=2)  
group by  app_key,lps_did,device_model) b  
on a.lps_did = b.lps_did and a.app_key=b.app_key 
group by a.app_key,a.app_channel,b.device_model;
"
== Physical Plan ==
InsertIntoHiveTable MetastoreRelation default, 
rps__h_report_more_dimension_first_channel_visit_cd_day, None, Map(p_event_date 
-> Some(2016-06-18)), true, false
+- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
functions=[(sum(visits#3L),mode=Final,isDistinct=false)], 
output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
   +- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, 
app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle 
partition size: 5])
  +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], 
functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], 
output=[app_key#7,app_channel#9,device_model#20,sum#41L])
 +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
+- BroadcastHashJoin [lps_did#8,app_key#7], 
[lps_did#13,app_key#12], Inner, BuildRight, None
   :- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
   :  +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], 
MetastoreRelation default, 
rps__h_report_more_dimension_earliest_newuser_list_c, None
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
string], input[0, string]))
  +- 
TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
functions=[(count(1),mode=Final,isDistinct=false)], 
output=[app_key#12,lps_did#13,device_model#20,visits#3L])
 +- Exchange(coordinator id: 733045095) 
hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), 
Some(coordinator[target post-shuffle partition size: 5])
+- 
TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], 
functions=[(count(1),mode=Partial,isDistinct=false)], 
output=[app_key#12,lps_did#13,device_model#20,count#39L])
   +- Project [app_key#12,lps_did#13,device_model#20]
  +- Filter ((isnotnull(app_key#12) && 
isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || (cast(log_type#11 
as int) = 2)))
 +- HiveTableScan 
[app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation default, 
rps__h_report_more_dimension_small, None, 
[isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
Time taken: 4.775 seconds, Fetched 1 row(s)
16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], 
Inner, BuildRight, None

Result:
1. Execution failed, spark service is unavailable.
2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, BroadcastHashJoin 
has been used when join two large tables.

Error log is as attached.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does

2016-06-21 Thread marymwu (JIRA)
marymwu created SPARK-16092:
---

 Summary: Spark2.0 take no effect after set 
hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 
configuration file while Spark1.6 does
 Key: SPARK-16092
 URL: https://issues.apache.org/jira/browse/SPARK-16092
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu


Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as 
a global variable in Spark2.0 configuration file while Spark1.6 does

Precondition:
set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 
configuration file
"
hive.exec.dynamic.partition.mode
nonstrict
"

Testcase:
"insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt)  select 
t.nid, t.price, t.dt from (select nid, price, dt from 
d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t 
group by t.nid, t.price, t.dt;"

Result:
Error: org.apache.spark.SparkException: Dynamic partition strict mode requires 
at least one static partition column. To turn this off set 
hive.exec.dynamic.partition.mode=nonstrict (state=,code=0)

Note:
Spark1.6 supports the above SQL statement after set 
hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark 
configuration file
"
hive.exec.dynamic.partition.mode
nonstrict
"




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition

2016-06-21 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-16089:

Attachment: StaticPartitionSQLStatementError.png

> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports
> ---
>
> Key: SPARK-16089
> URL: https://issues.apache.org/jira/browse/SPARK-16089
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>Priority: Minor
> Attachments: StaticPartitionSQLStatementError.png
>
>
> Spark2.0 doesn't support the certain static partition SQL statment as "insert 
> overwrite table targetTB PARTITION (partition field=xx) select 
> field1,field2,...,partition field from sourceTB where partition field=xx" 
> while Spark 1.6 supports.
> Testcase:
> "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION 
> (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where 
> dt = '2016-06-21';"
> Error: org.apache.spark.sql.AnalysisException: unresolved operator 
> 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt 
> -> Some(2016-06-21)), true, false; (state=,code=0)
> see attachment for reference.
> Note:
> The same SQL statement succeeded in Spark 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition

2016-06-21 Thread marymwu (JIRA)
marymwu created SPARK-16089:
---

 Summary: Spark2.0 doesn't support the certain static partition SQL 
statment as "insert overwrite table targetTB PARTITION (partition field=xx) 
select field1,field2,...,partition field from sourceTB where partition 
field=xx" while Spark 1.6 supports
 Key: SPARK-16089
 URL: https://issues.apache.org/jira/browse/SPARK-16089
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu
Priority: Minor


Spark2.0 doesn't support the certain static partition SQL statment as "insert 
overwrite table targetTB PARTITION (partition field=xx) select 
field1,field2,...,partition field from sourceTB where partition field=xx" while 
Spark 1.6 supports.

Testcase:
"insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION (dt='2016-06-21') 
select nid, price, dt from d_test_tpc_2g_txt.marytest where dt = '2016-06-21';"

Error: org.apache.spark.sql.AnalysisException: unresolved operator 
'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt 
-> Some(2016-06-21)), true, false; (state=,code=0)

see attachment for reference.

Note:
The same SQL statement succeeded in Spark 1.6.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""

2016-06-14 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329051#comment-15329051
 ] 

marymwu commented on SPARK-15802:
-

We still have a question. How to use "binary" protocol? It seems to us that 
shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default" means using 
the "binary" protocol, but SparkSQL connection failed in this situation.

> SparkSQL connection fail using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default""
> --
>
> Key: SPARK-15802
> URL: https://issues.apache.org/jira/browse/SPARK-15802
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> reproduce steps:
> 1. execute shell "sbin/start-thriftserver.sh --master yarn";
> 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default"";
> Actually result:
> SparkSQL connection failed and the log shows as follows:
> 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=}
> 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=}
> note:
> SparkSQL connection succeeded, if using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice""
> Two parameters(transportMode) have been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-13 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328860#comment-15328860
 ] 

marymwu commented on SPARK-15757:
-

Any update?

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>    Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Result.png
>
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.co

[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""

2016-06-07 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319908#comment-15319908
 ] 

marymwu commented on SPARK-15802:
-

looking forward to your reply, thanks

> SparkSQL connection fail using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default""
> --
>
> Key: SPARK-15802
> URL: https://issues.apache.org/jira/browse/SPARK-15802
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> reproduce steps:
> 1. execute shell "sbin/start-thriftserver.sh --master yarn";
> 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default"";
> Actually result:
> SparkSQL connection failed and the log shows as follows:
> 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=}
> 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=}
> note:
> SparkSQL connection succeeded, if using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice""
> Two parameters(transportMode) have been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""

2016-06-07 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319906#comment-15319906
 ] 

marymwu commented on SPARK-15802:
-

what's the right protocol?  how to specify it ?

> SparkSQL connection fail using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default""
> --
>
> Key: SPARK-15802
> URL: https://issues.apache.org/jira/browse/SPARK-15802
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> reproduce steps:
> 1. execute shell "sbin/start-thriftserver.sh --master yarn";
> 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default"";
> Actually result:
> SparkSQL connection failed and the log shows as follows:
> 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=}
> 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for 
> buffer 
> HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
>  application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
> HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=}
> note:
> SparkSQL connection succeeded, if using shell command "bin/beeline -u 
> "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice""
> Two parameters(transportMode) have been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[GitHub] spark pull request #13550: SPARK-15755

2016-06-07 Thread marymwu
GitHub user marymwu opened a pull request:

https://github.com/apache/spark/pull/13550

SPARK-15755

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-15755

java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marymwu/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13550


commit a2f43c2f59b461a37947a5696198a4aa7339579d
Author: Dongyang DY2 Tang <tang...@lenovo.com>
Date:   2016-06-08T01:37:13Z

fix bug: java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[jira] [Created] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""

2016-06-07 Thread marymwu (JIRA)
marymwu created SPARK-15802:
---

 Summary: SparkSQL connection fail using shell command "bin/beeline 
-u "jdbc:hive2://*.*.*.*:1/default""
 Key: SPARK-15802
 URL: https://issues.apache.org/jira/browse/SPARK-15802
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
        Reporter: marymwu


reproduce steps:
1. execute shell "sbin/start-thriftserver.sh --master yarn";
2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default"";
Actually result:
SparkSQL connection failed and the log shows as follows:
16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for 
buffer 
HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
 application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=}
16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for 
buffer 
HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type:
 application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for 
HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=}

note:
SparkSQL connection succeeded, if using shell command "bin/beeline -u 
"jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice""
Two parameters(transportMode) have been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer

2016-06-06 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317782#comment-15317782
 ] 

marymwu commented on SPARK-15755:
-

Any comments?

> java.lang.NullPointerException when run spark 2.0 setting 
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> -
>
> Key: SPARK-15755
> URL: https://issues.apache.org/jira/browse/SPARK-15755
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> java.lang.NullPointerException when run spark 2.0 setting 
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (org.apache.spark.util.BoundedPriorityQueue)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148)
>   at scala.math.Ordering$$anon$4.compare(Ordering.scala:111)
>   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
>   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
>   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
>   at java.util.PriorityQueue.add(PriorityQueue.java:306)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78)
>   at 
> com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>   ... 15 more
> 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (org.apache.spark.util.BoundedPriorityQueue)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
>   at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
>   at 
> org.apache.spark.scheduler

[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-06 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317778#comment-15317778
 ] 

marymwu commented on SPARK-15757:
-

The error occurs steps are as follows: hope it helps
1. use hive command to create a table,example,
"create table inventory
(
inv_date_sk   int,
inv_item_sk   int,
inv_warehouse_sk  int,
inv_quantity_on_hand  int
) row format delimited fields terminated by '\|'
stored as orc;"
2. use hive command,execute "insert overwrite inventory select * from 
sourcTb;"--> important step
3. use spark command, execute "select * from inventory;"-->error occurs as in 
description.
===
while we tried the following steps,things look fine:
1. use hive command to create a table,example,
"create table inventory
(
inv_date_sk   int,
inv_item_sk   int,
inv_warehouse_sk  int,
inv_quantity_on_hand  int
) row format delimited fields terminated by '\|'
stored as orc;"
2. use spark command,execute "insert overwrite inventory select * from 
sourcTb;"--> important step
3. use spark command, execute "select * from inventory;"-->error occurs as in 
description.--succeeded

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Result.png
>
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 

[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-05 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316170#comment-15316170
 ] 

marymwu commented on SPARK-15757:
-

Actually, Field "inv_date_sk" does exist! We have executed "desc inventory", 
the result is as attached.

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Result.png
>
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> ja

[jira] [Issue Comment Deleted] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc

2016-06-05 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-15757:

Comment: was deleted

(was: Actually,  Field "inv_date_sk" does exist! We have executed "desc 
inventory", the result is as attached.
)

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Result.png
>
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> ja

[jira] [Updated] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-05 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-15757:

Attachment: Result.png

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>    Affects Versions: 2.0.0
>Reporter: marymwu
> Attachments: Result.png
>
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExe

[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-05 Thread marymwu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316168#comment-15316168
 ] 

marymwu commented on SPARK-15757:
-

Actually,  Field "inv_date_sk" does exist! We have executed "desc inventory", 
the result is as attached.


> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.uti

[jira] [Updated] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2016-06-03 Thread marymwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

marymwu updated SPARK-15757:

Summary: Error occurs when using Spark sql "select" statement on orc file 
after hive sql "insert overwrite tb1 select * from sourcTb" has been executed 
on this orc file  (was: Error occurs when using Spark sql "select" statement on 
orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been 
executed)

> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed on this 
> orc file
> ---
>
> Key: SPARK-15757
> URL: https://issues.apache.org/jira/browse/SPARK-15757
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> Error occurs when using Spark sql "select" statement on orc file after hive 
> sql "insert overwrite tb1 select * from sourcTb" has been executed
> 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): 
> java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
>   at 
> org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
>   at 
> org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   

[jira] [Created] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed

2016-06-03 Thread marymwu (JIRA)
marymwu created SPARK-15757:
---

 Summary: Error occurs when using Spark sql "select" statement on 
orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been 
executed
 Key: SPARK-15757
 URL: https://issues.apache.org/jira/browse/SPARK-15757
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
        Reporter: marymwu


Error occurs when using Spark sql "select" statement on orc file after hive sql 
"insert overwrite tb1 select * from sourcTb" has been executed

0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in stage 7.0 
(TID 2532, smokeslave5.avatar.lenovomm.com): 
java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist.
at 
org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
at 
org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at 
org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251)
at 
org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
at 
org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:94)
at 
org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361)
at 
org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123)
at 
org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112)
at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278)
at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15756) SQL “stored as orcfile” cannot be supported while hive supports both keywords "orc" and "orcfile"

2016-06-03 Thread marymwu (JIRA)
marymwu created SPARK-15756:
---

 Summary: SQL “stored as orcfile” cannot be supported while hive 
supports both keywords "orc" and "orcfile"
 Key: SPARK-15756
 URL: https://issues.apache.org/jira/browse/SPARK-15756
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.0
        Reporter: marymwu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer

2016-06-03 Thread marymwu (JIRA)
marymwu created SPARK-15755:
---

 Summary: java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer
 Key: SPARK-15755
 URL: https://issues.apache.org/jira/browse/SPARK-15755
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: marymwu


java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer

16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
underlying (org.apache.spark.util.BoundedPriorityQueue)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157)
at 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148)
at scala.math.Ordering$$anon$4.compare(Ordering.scala:111)
at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
at java.util.PriorityQueue.offer(PriorityQueue.java:329)
at java.util.PriorityQueue.add(PriorityQueue.java:306)
at 
com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78)
at 
com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 15 more
16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
underlying (org.apache.spark.util.BoundedPriorityQueue)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)
at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312)
at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala