[jira] [Resolved] (SPARK-30784) Hive 2.3 profile should still use orc-nohive

2020-03-04 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-30784.
--
Resolution: Not A Bug

Resolving it because with Hive 2.3, using regular orc is required. 

> Hive 2.3 profile should still use orc-nohive
> 
>
> Key: SPARK-30784
> URL: https://issues.apache.org/jira/browse/SPARK-30784
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yin Huai
>Priority: Critical
>
> Originally reported at 
> [https://github.com/apache/spark/pull/26619#issuecomment-583802901]
>  
> Right now, Hive 2.3 profile pulls in regular orc, which depends on 
> hive-storage-api. However, hive-storage-api and hive-common have the 
> following common class files
>  
> org/apache/hadoop/hive/common/ValidReadTxnList.class
>  org/apache/hadoop/hive/common/ValidTxnList.class
>  org/apache/hadoop/hive/common/ValidTxnList$RangeResponse.class
> For example, 
> [https://github.com/apache/hive/blob/rel/storage-release-2.6.0/storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
>  (pulled in by orc 1.5.8) and 
> [https://github.com/apache/hive/blob/rel/release-2.3.6/common/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
>  (from hive-common 2.3.6) both are in the classpath and they are different. 
> Having both versions in the classpath can cause unexpected behavior due to 
> classloading order. We should still use orc-nohive, which has 
> hive-storage-api shaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-30976) Improve Maven Install Logic in build/mvn

2020-02-27 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai deleted SPARK-30976:
-


> Improve Maven Install Logic in build/mvn
> 
>
> Key: SPARK-30976
> URL: https://issues.apache.org/jira/browse/SPARK-30976
> Project: Spark
>  Issue Type: Improvement
>Reporter: Wesley Hsiao
>Priority: Major
>
> The current code at lacks a validation step to test the installed maven 
> binary at   This is a point of failure where apache jenkins machine jobs can 
> fail where a maven binary can fail to run due to a corrupted download from an 
> apache mirror.
> To improve the stability of apache jenkins machine builds, a maven binary 
> test logic should be added after maven download to verify that the maven 
> binary works.  If it doesn't pass the test, then download and install from 
> archive apache rep



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30784) Hive 2.3 profile should still use orc-nohive

2020-02-10 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-30784:
-
Description: 
Originally reported at 
[https://github.com/apache/spark/pull/26619#issuecomment-583802901]

 

Right now, Hive 2.3 profile pulls in regular orc, which depends on 
hive-storage-api. However, hive-storage-api and hive-common have the following 
common class files

 

org/apache/hadoop/hive/common/ValidReadTxnList.class
 org/apache/hadoop/hive/common/ValidTxnList.class
 org/apache/hadoop/hive/common/ValidTxnList$RangeResponse.class

For example, 
[https://github.com/apache/hive/blob/rel/storage-release-2.6.0/storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (pulled in by orc 1.5.8) and 
[https://github.com/apache/hive/blob/rel/release-2.3.6/common/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (from hive-common 2.3.6) both are in the classpath and they are different. 
Having both versions in the classpath can cause unexpected behavior due to 
classloading order. We should still use orc-nohive, which has hive-storage-api 
shaded.

  was:
Originally reported at 
[https://github.com/apache/spark/pull/26619#issuecomment-583802901]

 

Right now, Hive 2.3 profile pulls in regular orc, which depends on 
hive-storage-api. However, hive-storage-api and hive-common have the following 
common class files

 

{{org/apache/hadoop/hive/common/ValidReadTxnList.class
org/apache/hadoop/hive/common/ValidTxnList.class
org/apache/hadoop/hive/common/ValidTxnList$RangeResponse.class}}

For example, 
[https://github.com/apache/hive/blob/rel/storage-release-2.6.0/storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (pulled in by orc 1.5.8) and 
[https://github.com/apache/hive/blob/rel/release-2.3.6/common/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (from hive-common 2.3.6) both are in the classpath and they are different. 
Having both versions in the classpath can cause unexpected behavior due to 
classloading order. We should still use orc-nohive, which has hive-storage-api 
shaded.


> Hive 2.3 profile should still use orc-nohive
> 
>
> Key: SPARK-30784
> URL: https://issues.apache.org/jira/browse/SPARK-30784
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yin Huai
>Priority: Blocker
>
> Originally reported at 
> [https://github.com/apache/spark/pull/26619#issuecomment-583802901]
>  
> Right now, Hive 2.3 profile pulls in regular orc, which depends on 
> hive-storage-api. However, hive-storage-api and hive-common have the 
> following common class files
>  
> org/apache/hadoop/hive/common/ValidReadTxnList.class
>  org/apache/hadoop/hive/common/ValidTxnList.class
>  org/apache/hadoop/hive/common/ValidTxnList$RangeResponse.class
> For example, 
> [https://github.com/apache/hive/blob/rel/storage-release-2.6.0/storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
>  (pulled in by orc 1.5.8) and 
> [https://github.com/apache/hive/blob/rel/release-2.3.6/common/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
>  (from hive-common 2.3.6) both are in the classpath and they are different. 
> Having both versions in the classpath can cause unexpected behavior due to 
> classloading order. We should still use orc-nohive, which has 
> hive-storage-api shaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30783) Hive 2.3 profile should exclude hive-service-rpc

2020-02-10 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-30783:
-
Attachment: hive-service-rpc-2.3.6-classes
spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364-classes

> Hive 2.3 profile should exclude hive-service-rpc
> 
>
> Key: SPARK-30783
> URL: https://issues.apache.org/jira/browse/SPARK-30783
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Blocker
> Attachments: hive-service-rpc-2.3.6-classes, 
> spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364-classes
>
>
> hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate 
> classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark 
> can pick up classes defined in hive instead of its thrift server module, 
> which can cause hard to debug runtime errors due to class loading order and 
> compilation errors for applications depend on spark.
>  
> If you compare hive-service-rpc 2.3.6's jar 
> ([https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar])
>  and spark thrift server's jar (e.g. 
> [https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar),]
>  you will see that all of classes provided by hive-service-rpc-2.3.6.jar are 
> covered by spark thrift server's jar. I am attaching the list of jar contents 
> for your reference.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30783) Hive 2.3 profile should exclude hive-service-rpc

2020-02-10 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-30783:
-
Description: 
hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate 
classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can 
pick up classes defined in hive instead of its thrift server module, which can 
cause hard to debug runtime errors due to class loading order and compilation 
errors for applications depend on spark.

 

If you compare hive-service-rpc 2.3.6's jar 
([https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar])
 and spark thrift server's jar (e.g. 
[https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar),]
 you will see that all of classes provided by hive-service-rpc-2.3.6.jar are 
covered by spark thrift server's jar. I am attaching the list of jar contents 
for your reference.

 

  was:hive-service-rpc 2.3.6 and spark sql's thrift server module have 
duplicate classes. Leaving hive-service-rpc 2.3.6 in the class path means that 
spark can pick up classes defined in hive instead of its thrift server module, 
which can cause hard to debug runtime errors due to class loading order and 
compilation errors for applications depend on spark.


> Hive 2.3 profile should exclude hive-service-rpc
> 
>
> Key: SPARK-30783
> URL: https://issues.apache.org/jira/browse/SPARK-30783
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Blocker
>
> hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate 
> classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark 
> can pick up classes defined in hive instead of its thrift server module, 
> which can cause hard to debug runtime errors due to class loading order and 
> compilation errors for applications depend on spark.
>  
> If you compare hive-service-rpc 2.3.6's jar 
> ([https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar])
>  and spark thrift server's jar (e.g. 
> [https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar),]
>  you will see that all of classes provided by hive-service-rpc-2.3.6.jar are 
> covered by spark thrift server's jar. I am attaching the list of jar contents 
> for your reference.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30784) Hive 2.3 profile should still use orc-nohive

2020-02-10 Thread Yin Huai (Jira)
Yin Huai created SPARK-30784:


 Summary: Hive 2.3 profile should still use orc-nohive
 Key: SPARK-30784
 URL: https://issues.apache.org/jira/browse/SPARK-30784
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yin Huai


Originally reported at 
[https://github.com/apache/spark/pull/26619#issuecomment-583802901]

 

Right now, Hive 2.3 profile pulls in regular orc, which depends on 
hive-storage-api. However, hive-storage-api and hive-common have the following 
common class files

 

{{org/apache/hadoop/hive/common/ValidReadTxnList.class
org/apache/hadoop/hive/common/ValidTxnList.class
org/apache/hadoop/hive/common/ValidTxnList$RangeResponse.class}}

For example, 
[https://github.com/apache/hive/blob/rel/storage-release-2.6.0/storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (pulled in by orc 1.5.8) and 
[https://github.com/apache/hive/blob/rel/release-2.3.6/common/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java]
 (from hive-common 2.3.6) both are in the classpath and they are different. 
Having both versions in the classpath can cause unexpected behavior due to 
classloading order. We should still use orc-nohive, which has hive-storage-api 
shaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30783) Hive 2.3 profile should exclude hive-service-rpc

2020-02-10 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-30783:


Assignee: Yin Huai

> Hive 2.3 profile should exclude hive-service-rpc
> 
>
> Key: SPARK-30783
> URL: https://issues.apache.org/jira/browse/SPARK-30783
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Blocker
>
> hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate 
> classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark 
> can pick up classes defined in hive instead of its thrift server module, 
> which can cause hard to debug runtime errors due to class loading order and 
> compilation errors for applications depend on spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30783) Hive 2.3 profile should exclude hive-service-rpc

2020-02-10 Thread Yin Huai (Jira)
Yin Huai created SPARK-30783:


 Summary: Hive 2.3 profile should exclude hive-service-rpc
 Key: SPARK-30783
 URL: https://issues.apache.org/jira/browse/SPARK-30783
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yin Huai


hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate 
classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can 
pick up classes defined in hive instead of its thrift server module, which can 
cause hard to debug runtime errors due to class loading order and compilation 
errors for applications depend on spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30450) Exclude .git folder for python linter

2020-01-07 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-30450:
-
Affects Version/s: (was: 2.4.4)
   3.0.0

> Exclude .git folder for python linter
> -
>
> Key: SPARK-30450
> URL: https://issues.apache.org/jira/browse/SPARK-30450
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Minor
>
> The python linter shouldn't include the .git folder. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30450) Exclude .git folder for python linter

2020-01-07 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-30450:
-
Priority: Minor  (was: Major)

> Exclude .git folder for python linter
> -
>
> Key: SPARK-30450
> URL: https://issues.apache.org/jira/browse/SPARK-30450
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Minor
>
> The python linter shouldn't include the .git folder. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30450) Exclude .git folder for python linter

2020-01-07 Thread Yin Huai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-30450:


Assignee: Eric Chang

> Exclude .git folder for python linter
> -
>
> Key: SPARK-30450
> URL: https://issues.apache.org/jira/browse/SPARK-30450
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Major
>
> The python linter shouldn't include the .git folder. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25019) The published spark sql pom does not exclude the normal version of orc-core

2018-08-06 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-25019.
--
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.4.0

[https://github.com/apache/spark/pull/22003] has been merged.

> The published spark sql pom does not exclude the normal version of orc-core 
> 
>
> Key: SPARK-25019
> URL: https://issues.apache.org/jira/browse/SPARK-25019
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.4.0
>Reporter: Yin Huai
>Assignee: Dongjoon Hyun
>Priority: Critical
> Fix For: 2.4.0
>
>
> I noticed that 
> [https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.11/2.4.0-SNAPSHOT/spark-sql_2.11-2.4.0-20180803.100335-189.pom]
>  does not exclude the normal version of orc-core. Comparing with 
> [https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/sql/core/pom.xml#L108]
>  and 
> [https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/pom.xml#L1767,]
>  we only exclude the normal version of orc-core in the parent pom. So, the 
> problem is that if a developer depends on spark-sql-core directly, orc-core 
> and orc-core-nohive will be in the dependency list. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25019) The published spark sql pom does not exclude the normal version of orc-core

2018-08-03 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568554#comment-16568554
 ] 

Yin Huai commented on SPARK-25019:
--

[~dongjoon] can you help us fix this issue? Or there is a reason that the 
parent pom and sql/core/pom are not consistent?

> The published spark sql pom does not exclude the normal version of orc-core 
> 
>
> Key: SPARK-25019
> URL: https://issues.apache.org/jira/browse/SPARK-25019
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.4.0
>Reporter: Yin Huai
>Priority: Critical
>
> I noticed that 
> [https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.11/2.4.0-SNAPSHOT/spark-sql_2.11-2.4.0-20180803.100335-189.pom]
>  does not exclude the normal version of orc-core. Comparing with 
> [https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/sql/core/pom.xml#L108]
>  and 
> [https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/pom.xml#L1767,]
>  we only exclude the normal version of orc-core in the parent pom. So, the 
> problem is that if a developer depends on spark-sql-core directly, orc-core 
> and orc-core-nohive will be in the dependency list. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25019) The published spark sql pom does not exclude the normal version of orc-core

2018-08-03 Thread Yin Huai (JIRA)
Yin Huai created SPARK-25019:


 Summary: The published spark sql pom does not exclude the normal 
version of orc-core 
 Key: SPARK-25019
 URL: https://issues.apache.org/jira/browse/SPARK-25019
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Affects Versions: 2.4.0
Reporter: Yin Huai


I noticed that 
[https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.11/2.4.0-SNAPSHOT/spark-sql_2.11-2.4.0-20180803.100335-189.pom]
 does not exclude the normal version of orc-core. Comparing with 
[https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/sql/core/pom.xml#L108]
 and 
[https://github.com/apache/spark/blob/92b48842b944a3e430472294cdc3c481bad6b804/pom.xml#L1767,]
 we only exclude the normal version of orc-core in the parent pom. So, the 
problem is that if a developer depends on spark-sql-core directly, orc-core and 
orc-core-nohive will be in the dependency list. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-27 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559977#comment-16559977
 ] 

Yin Huai commented on SPARK-24895:
--

[https://github.com/spotbugs/spotbugs-maven-plugin/issues/21] has some info on 
it. I am wondering if it requires upgrading both the plugin and maven. We 
probably need to setup a testing jenkins job to make sure everything works 
before checking in changes.

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark 2.4.0 has Maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
> {noformat}
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
>   org.apache.spark
>   spark-mllib-local_2.11
>   2.4.0-SNAPSHOT
>   
> 
>   20180723.232411
>   177
> 
> 20180723232411
> 
>   
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
> 
>   
> 
> {code}
>  
> This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-24 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554932#comment-16554932
 ] 

Yin Huai commented on SPARK-24895:
--

[~hyukjin.kwon] [~kiszk] seems this revert indeed fixed the problem :)

 

 

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark 2.4.0 has Maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
> {noformat}
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
>   org.apache.spark
>   spark-mllib-local_2.11
>   2.4.0-SNAPSHOT
>   
> 
>   20180723.232411
>   177
> 
> 20180723232411
> 
>   
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
> 
>   
> 
> {code}
>  
> This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-24 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-24895.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

[https://github.com/apache/spark/pull/21865] has been merged.

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Assignee: Eric Chang
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark 2.4.0 has Maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
> {noformat}
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
>   org.apache.spark
>   spark-mllib-local_2.11
>   2.4.0-SNAPSHOT
>   
> 
>   20180723.232411
>   177
> 
> 20180723232411
> 
>   
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
>   
>   
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
>   
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
>   
> 
>   
> 
> {code}
>  
> This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-23 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553610#comment-16553610
 ] 

Yin Huai commented on SPARK-24895:
--

[~kiszk] [~hyukjin.kwon] since this thing is pretty tricky to test it out 
actually, do you mind if I remove the spotbugs and test out our nightly 
snapshot build? If this plugin is not the cause, I will add it back. If it is 
indeed the cause, we can figure out how to fix it. Thanks!

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Priority: Major
>
> Spark 2.4.0 has maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
> org.apache.spark
> spark-mllib-local_2.11
> 2.4.0-SNAPSHOT
> 
> 
> 20180723.232411
> 177
> 
> 20180723232411
> 
> 
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> 
> 
> {code}
>  
>  This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-23 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553568#comment-16553568
 ] 

Yin Huai commented on SPARK-24895:
--

 

[~kiszk] and [~hyukjin.kwon] we hit this issue today. Per 
[https://github.com/spotbugs/spotbugs-maven-plugin/issues/21,] it may be 
related to spot-bug plugin. We are trying to verify it now.

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Priority: Major
>
> Spark 2.4.0 has maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
> org.apache.spark
> spark-mllib-local_2.11
> 2.4.0-SNAPSHOT
> 
> 
> 20180723.232411
> 177
> 
> 20180723232411
> 
> 
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> 
> 
> {code}
>  
>  This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24895) Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames

2018-07-23 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-24895:
-
Target Version/s: 2.4.0

> Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
> --
>
> Key: SPARK-24895
> URL: https://issues.apache.org/jira/browse/SPARK-24895
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Eric Chang
>Priority: Major
>
> Spark 2.4.0 has maven build errors because artifacts uploaded to apache maven 
> repo has mismatched filenames:
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce 
> (enforce-banned-dependencies) on project spark_2.4: Execution 
> enforce-banned-dependencies of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: 
> org.apache.maven.shared.dependency.graph.DependencyGraphBuilderException: 
> Could not resolve following dependencies: 
> [org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT (compile), 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT (compile)]: Could not 
> resolve dependencies for project com.databricks:spark_2.4:pom:1: The 
> following artifacts could not be resolved: 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-network-shuffle_2.11:jar:2.4.0-SNAPSHOT, 
> org.apache.spark:spark-sketch_2.11:jar:2.4.0-SNAPSHOT: Could not find 
> artifact 
> org.apache.spark:spark-mllib-local_2.11:jar:2.4.0-20180723.232411-177 in 
> apache-snapshots ([https://repository.apache.org/snapshots/]) -> [Help 1]
>  
> If you check the artifact metadata you will see the pom and jar files are 
> 2.4.0-20180723.232411-177 instead of 2.4.0-20180723.232410-177:
> {code:xml}
> 
> org.apache.spark
> spark-mllib-local_2.11
> 2.4.0-SNAPSHOT
> 
> 
> 20180723.232411
> 177
> 
> 20180723232411
> 
> 
> jar
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> pom
> 2.4.0-20180723.232411-177
> 20180723232411
> 
> 
> tests
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> test-sources
> jar
> 2.4.0-20180723.232410-177
> 20180723232411
> 
> 
> 
> 
> {code}
>  
>  This behavior is very similar to this issue: 
> https://issues.apache.org/jira/browse/MDEPLOY-221
> Since 2.3.0 snapshots work with the same maven 3.3.9 version and maven deploy 
> 2.8.2 plugin, it is highly possible that we introduced a new plugin that 
> causes this. 
> The most recent addition is the spot-bugs plugin, which is known to have 
> incompatibilities with other plugins: 
> [https://github.com/spotbugs/spotbugs-maven-plugin/issues/21]
> We may want to try building without it to sanity check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349679#comment-16349679
 ] 

Yin Huai commented on SPARK-23310:
--

[~sitalke...@gmail.com] We found that the commit for SPARK-21113 introduced a 
noticeable regression. Because Q95 is a join heavy join, which represents one 
set of common workloads, I am concerned that this regression is quite easy to 
hit by users of Spark 2.3. Considering that setting 
spark.unsafe.sorter.spill.read.ahead.enabled to false improves the overall 
performance of all TPC-DS queries, how about we set 
spark.unsafe.sorter.spill.read.ahead.enabled to false by default in Spark 2.3? 
Then, we can look into how to resolve this regression for Spark 2.4. What do 
you think?

(Feel free to enable it for your workloads because they will definitely help 
Spark to improve this part :) )

> Perf regression introduced by SPARK-21113
> -
>
> Key: SPARK-23310
> URL: https://issues.apache.org/jira/browse/SPARK-23310
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yin Huai
>Priority: Blocker
>
> While running all TPC-DS queries with SF set to 1000, we noticed that Q95 
> (https://github.com/databricks/spark-sql-perf/blob/master/src/main/resources/tpcds_2_4/q95.sql)
>  has noticeable regression (11%). After looking into it, we found that the 
> regression was introduced by SPARK-21113. Specially, ReadAheadInputStream 
> gets lock congestion. After setting 
> spark.unsafe.sorter.spill.read.ahead.enabled set to false, the regression 
> disappear and the overall performance of all TPC-DS queries has improved.
>  
> I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to 
> false by default for Spark 2.3 and re-enable it after addressing the lock 
> congestion issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-23310:
-
Description: 
While running all TPC-DS queries with SF set to 1000, we noticed that Q95 
(https://github.com/databricks/spark-sql-perf/blob/master/src/main/resources/tpcds_2_4/q95.sql)
 has noticeable regression (11%). After looking into it, we found that the 
regression was introduced by SPARK-21113. Specially, ReadAheadInputStream gets 
lock congestion. After setting spark.unsafe.sorter.spill.read.ahead.enabled set 
to false, the regression disappear and the overall performance of all TPC-DS 
queries has improved.

 

I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to 
false by default for Spark 2.3 and re-enable it after addressing the lock 
congestion issue. 

  was:
While running all TPC-DS queries with SF set to 1000, we noticed that Q95 has 
noticeable regression (11%). After looking into it, we found that the 
regression was introduced by SPARK-21113. Specially, ReadAheadInputStream gets 
lock congestion. After setting spark.unsafe.sorter.spill.read.ahead.enabled set 
to false, the regression disappear and the overall performance of all TPC-DS 
queries has improved.

 

I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to 
false by default for Spark 2.3 and re-enable it after addressing the lock 
congestion issue. 


> Perf regression introduced by SPARK-21113
> -
>
> Key: SPARK-23310
> URL: https://issues.apache.org/jira/browse/SPARK-23310
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yin Huai
>Priority: Blocker
>
> While running all TPC-DS queries with SF set to 1000, we noticed that Q95 
> (https://github.com/databricks/spark-sql-perf/blob/master/src/main/resources/tpcds_2_4/q95.sql)
>  has noticeable regression (11%). After looking into it, we found that the 
> regression was introduced by SPARK-21113. Specially, ReadAheadInputStream 
> gets lock congestion. After setting 
> spark.unsafe.sorter.spill.read.ahead.enabled set to false, the regression 
> disappear and the overall performance of all TPC-DS queries has improved.
>  
> I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to 
> false by default for Spark 2.3 and re-enable it after addressing the lock 
> congestion issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)
Yin Huai created SPARK-23310:


 Summary: Perf regression introduced by SPARK-21113
 Key: SPARK-23310
 URL: https://issues.apache.org/jira/browse/SPARK-23310
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Yin Huai


While running all TPC-DS queries with SF set to 1000, we noticed that Q95 has 
noticeable regression (11%). After looking into it, we found that the 
regression was introduced by SPARK-21113. Specially, ReadAheadInputStream gets 
lock congestion. After setting spark.unsafe.sorter.spill.read.ahead.enabled set 
to false, the regression disappear and the overall performance of all TPC-DS 
queries has improved.

 

I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to 
false by default for Spark 2.3 and re-enable it after addressing the lock 
congestion issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-01 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349325#comment-16349325
 ] 

Yin Huai commented on SPARK-12297:
--

[~zi] has this issue got resolved in Hive? I see HIVE-12767 is still open.

> Add work-around for Parquet/Hive int96 timestamp bug.
> -
>
> Key: SPARK-12297
> URL: https://issues.apache.org/jira/browse/SPARK-12297
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Ryan Blue
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.3.0
>
>
> Spark copied Hive's behavior for parquet, but this was inconsistent with 
> other file formats, and inconsistent with Impala (which is the original 
> source of putting a timestamp as an int96 in parquet, I believe).  This made 
> timestamps in parquet act more like timestamps with timezones, while in other 
> file formats, timestamps have no time zone, they are a "floating time".
> The easiest way to see this issue is to write out a table with timestamps in 
> multiple different formats from one timezone, then try to read them back in 
> another timezone.  Eg., here I write out a few timestamps to parquet and 
> textfile hive tables, and also just as a json file, all in the 
> "America/Los_Angeles" timezone:
> {code}
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val tblPrefix = args(0)
> val schema = new StructType().add("ts", TimestampType)
> val rows = sc.parallelize(Seq(
>   "2015-12-31 23:50:59.123",
>   "2015-12-31 22:49:59.123",
>   "2016-01-01 00:39:59.123",
>   "2016-01-01 01:29:59.123"
> ).map { x => Row(java.sql.Timestamp.valueOf(x)) })
> val rawData = spark.createDataFrame(rows, schema).toDF()
> rawData.show()
> Seq("parquet", "textfile").foreach { format =>
>   val tblName = s"${tblPrefix}_$format"
>   spark.sql(s"DROP TABLE IF EXISTS $tblName")
>   spark.sql(
> raw"""CREATE TABLE $tblName (
>   |  ts timestamp
>   | )
>   | STORED AS $format
>  """.stripMargin)
>   rawData.write.insertInto(tblName)
> }
> rawData.write.json(s"${tblPrefix}_json")
> {code}
> Then I start a spark-shell in "America/New_York" timezone, and read the data 
> back from each table:
> {code}
> scala> spark.sql("select * from la_parquet").collect().foreach{println}
> [2016-01-01 02:50:59.123]
> [2016-01-01 01:49:59.123]
> [2016-01-01 03:39:59.123]
> [2016-01-01 04:29:59.123]
> scala> spark.sql("select * from la_textfile").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").join(spark.sql("select * from 
> la_textfile"), "ts").show()
> ++
> |  ts|
> ++
> |2015-12-31 23:50:...|
> |2015-12-31 22:49:...|
> |2016-01-01 00:39:...|
> |2016-01-01 01:29:...|
> ++
> scala> spark.read.json("la_json").join(spark.sql("select * from la_parquet"), 
> "ts").show()
> +---+
> | ts|
> +---+
> +---+
> {code}
> The textfile and json based data shows the same times, and can be joined 
> against each other, while the times from the parquet data have changed (and 
> obviously joins fail).
> This is a big problem for any organization that may try to read the same data 
> (say in S3) with clusters in multiple timezones.  It can also be a nasty 
> surprise as an organization tries to migrate file formats.  Finally, its a 
> source of incompatibility between Hive, Impala, and Spark.
> HIVE-12767 aims to fix this by introducing a table property which indicates 
> the "storage timezone" for the table.  Spark should add the same to ensure 
> consistency between file formats, and with Hive & Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23292) python tests related to pandas are skipped

2018-02-01 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-23292:
-
Priority: Critical  (was: Blocker)

> python tests related to pandas are skipped
> --
>
> Key: SPARK-23292
> URL: https://issues.apache.org/jira/browse/SPARK-23292
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Yin Huai
>Priority: Critical
>
> I was running python tests and found that 
> [pyspark.sql.tests.GroupbyAggPandasUDFTests.test_unsupported_types|https://github.com/apache/spark/blob/52e00f70663a87b5837235bdf72a3e6f84e11411/python/pyspark/sql/tests.py#L4528-L4548]
>  does not run with Python 2 because the test uses "assertRaisesRegex" 
> (supported by Python 3) instead of "assertRaisesRegexp" (supported by Python 
> 2). However, spark jenkins does not fail because of this issue (see run 
> history at 
> [here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/]).
>  After looking into this issue, [seems test script will skip tests related to 
> pandas if pandas is not 
> installed|https://github.com/apache/spark/blob/2ac895be909de7e58e1051dc2a1bba98a25bf4be/python/pyspark/sql/tests.py#L51-L63],
>  which means that jenkins does not have pandas installed. 
>  
> Since pyarrow related tests have the same skipping logic, we will need to 
> check if jenkins has pyarrow installed correctly as well. 
>  
> Since features using pandas and pyarrow are in 2.3, we should fix the test 
> issue and make sure all tests pass before we make the release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23292) python tests related to pandas are skipped

2018-02-01 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349040#comment-16349040
 ] 

Yin Huai commented on SPARK-23292:
--

So, jenkins does have the right version of pandas and pyarrow for python 3. 
There were some difficulties on upgrading pandas and install pyarrow in python 
2 (see discussions in [https://github.com/apache/spark/pull/19884).]

> python tests related to pandas are skipped
> --
>
> Key: SPARK-23292
> URL: https://issues.apache.org/jira/browse/SPARK-23292
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Yin Huai
>Priority: Blocker
>
> I was running python tests and found that 
> [pyspark.sql.tests.GroupbyAggPandasUDFTests.test_unsupported_types|https://github.com/apache/spark/blob/52e00f70663a87b5837235bdf72a3e6f84e11411/python/pyspark/sql/tests.py#L4528-L4548]
>  does not run with Python 2 because the test uses "assertRaisesRegex" 
> (supported by Python 3) instead of "assertRaisesRegexp" (supported by Python 
> 2). However, spark jenkins does not fail because of this issue (see run 
> history at 
> [here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/]).
>  After looking into this issue, [seems test script will skip tests related to 
> pandas if pandas is not 
> installed|https://github.com/apache/spark/blob/2ac895be909de7e58e1051dc2a1bba98a25bf4be/python/pyspark/sql/tests.py#L51-L63],
>  which means that jenkins does not have pandas installed. 
>  
> Since pyarrow related tests have the same skipping logic, we will need to 
> check if jenkins has pyarrow installed correctly as well. 
>  
> Since features using pandas and pyarrow are in 2.3, we should fix the test 
> issue and make sure all tests pass before we make the release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23292) python tests related to pandas are skipped

2018-01-31 Thread Yin Huai (JIRA)
Yin Huai created SPARK-23292:


 Summary: python tests related to pandas are skipped
 Key: SPARK-23292
 URL: https://issues.apache.org/jira/browse/SPARK-23292
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.3.0
Reporter: Yin Huai


I was running python tests and found that 
[pyspark.sql.tests.GroupbyAggPandasUDFTests.test_unsupported_types|https://github.com/apache/spark/blob/52e00f70663a87b5837235bdf72a3e6f84e11411/python/pyspark/sql/tests.py#L4528-L4548]
 does not run with Python 2 because the test uses "assertRaisesRegex" 
(supported by Python 3) instead of "assertRaisesRegexp" (supported by Python 
2). However, spark jenkins does not fail because of this issue (see run history 
at 
[here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/]).
 After looking into this issue, [seems test script will skip tests related to 
pandas if pandas is not 
installed|https://github.com/apache/spark/blob/2ac895be909de7e58e1051dc2a1bba98a25bf4be/python/pyspark/sql/tests.py#L51-L63],
 which means that jenkins does not have pandas installed. 
 
Since pyarrow related tests have the same skipping logic, we will need to check 
if jenkins has pyarrow installed correctly as well. 
 
Since features using pandas and pyarrow are in 2.3, we should fix the test 
issue and make sure all tests pass before we make the release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2018-01-25 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340544#comment-16340544
 ] 

Yin Huai commented on SPARK-4502:
-

I think it makes sense to target for 2.4.0. 2.3.1 is a maintenance release. 
Since this is not a bug fix, it is not suitable for a maintenance release.

> Spark SQL reads unneccesary nested fields from Parquet
> --
>
> Key: SPARK-4502
> URL: https://issues.apache.org/jira/browse/SPARK-4502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Liwen Sun
>Priority: Critical
>
> When reading a field of a nested column from Parquet, SparkSQL reads and 
> assemble all the fields of that nested column. This is unnecessary, as 
> Parquet supports fine-grained field reads out of a nested column. This may 
> degrades the performance significantly when a nested column has many fields. 
> For example, I loaded json tweets data into SparkSQL and ran the following 
> query:
> {{SELECT User.contributors_enabled from Tweets;}}
> User is a nested structure that has 38 primitive fields (for Tweets schema, 
> see: https://dev.twitter.com/overview/api/tweets), here is the log message:
> {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 
> cell/ms}}
> For comparison, I also ran:
> {{SELECT User FROM Tweets;}}
> And here is the log message:
> {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}}
> So both queries load 38 columns from Parquet, while the first query only 
> needs 1 column. I also measured the bytes read within Parquet. In these two 
> cases, the same number of bytes (99365194 bytes) were read. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22812) Failing cran-check on master

2017-12-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295532#comment-16295532
 ] 

Yin Huai commented on SPARK-22812:
--

Thank you guys!

> Failing cran-check on master 
> -
>
> Key: SPARK-22812
> URL: https://issues.apache.org/jira/browse/SPARK-22812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Hossein Falaki
>Priority: Minor
>
> When I run {{R/run-tests.sh}} or {{R/check-cran.sh}} I get the following 
> failure message:
> {code}
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) :
>   dims [product 22] do not match the length of object [0]
> {code}
> cc [~felixcheung] have you experienced this error before?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21927) Spark pom.xml's dependency management is broken

2017-09-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154575#comment-16154575
 ] 

Yin Huai commented on SPARK-21927:
--

My worry is that it may mask actual issues related to dependencies. For 
example, the dependency resolvers may pick a version that is not specified in 
our pom.

> Spark pom.xml's dependency management is broken
> ---
>
> Key: SPARK-21927
> URL: https://issues.apache.org/jira/browse/SPARK-21927
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
> Environment: Apache Spark current master (commit 
> 12ab7f7e89ec9e102859ab3b710815d3058a2e8d)
>Reporter: Kris Mok
>
> When building the current Spark master just now (commit 
> 12ab7f7e89ec9e102859ab3b710815d3058a2e8d), I noticed the build prints a lot 
> of warning messages such as the following. Looks like the dependency 
> management in the POMs are somehow broken recently.
> {code:none}
> .../workspace/apache-spark/master (master) $ build/sbt clean package
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.16.jar
> [info] Loading project definition from 
> .../workspace/apache-spark/master/project
> [info] Updating 
> {file:.../workspace/apache-spark/master/project/}master-build...
> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
> [info] downloading 
> https://repo1.maven.org/maven2/org/scalastyle/scalastyle-sbt-plugin_2.10_0.13/1.0.0/scalastyle-sbt-plugin-1.0.0.jar
>  ...
> [info] [SUCCESSFUL ] 
> org.scalastyle#scalastyle-sbt-plugin;1.0.0!scalastyle-sbt-plugin.jar (239ms)
> [info] downloading 
> https://repo1.maven.org/maven2/org/scalastyle/scalastyle_2.10/1.0.0/scalastyle_2.10-1.0.0.jar
>  ...
> [info] [SUCCESSFUL ] 
> org.scalastyle#scalastyle_2.10;1.0.0!scalastyle_2.10.jar (465ms)
> [info] Done updating.
> [warn] Found version conflict(s) in library dependencies; some are suspected 
> to be binary incompatible:
> [warn] 
> [warn] * org.apache.maven.wagon:wagon-provider-api:2.2 is selected over 
> 1.0-beta-6
> [warn] +- org.apache.maven:maven-compat:3.0.4(depends 
> on 2.2)
> [warn] +- org.apache.maven.wagon:wagon-file:2.2  (depends 
> on 2.2)
> [warn] +- org.spark-project:sbt-pom-reader:1.0.0-spark 
> (scalaVersion=2.10, sbtVersion=0.13) (depends on 2.2)
> [warn] +- org.apache.maven.wagon:wagon-http-shared4:2.2  (depends 
> on 2.2)
> [warn] +- org.apache.maven.wagon:wagon-http:2.2  (depends 
> on 2.2)
> [warn] +- org.apache.maven.wagon:wagon-http-lightweight:2.2  (depends 
> on 2.2)
> [warn] +- org.sonatype.aether:aether-connector-wagon:1.13.1  (depends 
> on 1.0-beta-6)
> [warn] 
> [warn] * org.codehaus.plexus:plexus-utils:3.0 is selected over {2.0.7, 
> 2.0.6, 2.1, 1.5.5}
> [warn] +- org.apache.maven.wagon:wagon-provider-api:2.2  (depends 
> on 3.0)
> [warn] +- org.apache.maven:maven-compat:3.0.4(depends 
> on 2.0.6)
> [warn] +- org.sonatype.sisu:sisu-inject-plexus:2.3.0 (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-artifact:3.0.4  (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-core:3.0.4  (depends 
> on 2.0.6)
> [warn] +- org.sonatype.plexus:plexus-sec-dispatcher:1.3  (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-embedder:3.0.4  (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-settings:3.0.4  (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-settings-builder:3.0.4  (depends 
> on 2.0.6)
> [warn] +- org.apache.maven:maven-model-builder:3.0.4 (depends 
> on 2.0.7)
> [warn] +- org.sonatype.aether:aether-connector-wagon:1.13.1  (depends 
> on 2.0.7)
> [warn] +- org.sonatype.sisu:sisu-inject-plexus:2.2.3 (depends 
> on 2.0.7)
> [warn] +- org.apache.maven:maven-model:3.0.4 (depends 
> on 2.0.7)
> [warn] +- org.apache.maven:maven-aether-provider:3.0.4   (depends 
> on 2.0.7)
> [warn] +- org.apache.maven:maven-repository-metadata:3.0.4   (depends 
> on 2.0.7)
> [warn] 
> [warn] * cglib:cglib is evicted completely
> [warn] +- org.sonatype.sisu:sisu-guice:3.0.3 (depends 
> on 2.2.2)
> [warn] 
> [warn] * asm:asm is evicted completely
> [warn] +- cglib:cglib:2.2.2  (depends 
> on 3.3.1)
> [warn] 
> [warn] Run 'evicted' to see detailed eviction warnings
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional

[jira] [Updated] (SPARK-21258) Window result incorrect using complex object with spilling

2017-07-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-21258:
-
Fix Version/s: (was: 2.1.2)

> Window result incorrect using complex object with spilling
> --
>
> Key: SPARK-21258
> URL: https://issues.apache.org/jira/browse/SPARK-21258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21258) Window result incorrect using complex object with spilling

2017-07-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075055#comment-16075055
 ] 

Yin Huai commented on SPARK-21258:
--

Since this change is not in branch-2.1, I am removing 2.1.2 from the list of 
fix versions.

> Window result incorrect using complex object with spilling
> --
>
> Key: SPARK-21258
> URL: https://issues.apache.org/jira/browse/SPARK-21258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21111) Fix test failure in 2.2

2017-06-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-2.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 18316
[https://github.com/apache/spark/pull/18316]

> Fix test failure in 2.2 
> 
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Test failure:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-20311) SQL "range(N) as alias" or "range(N) alias" doesn't work

2017-05-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-20311:
--

> SQL "range(N) as alias" or "range(N) alias" doesn't work
> 
>
> Key: SPARK-20311
> URL: https://issues.apache.org/jira/browse/SPARK-20311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Juliusz Sompolski
>Assignee: Takeshi Yamamuro
>Priority: Minor
>
> `select * from range(10) as A;` or `select * from range(10) A;`
> does not work.
> As a workaround, a subquery has to be used:
> `select * from (select * from range(10)) as A;`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20311) SQL "range(N) as alias" or "range(N) alias" doesn't work

2017-05-09 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003626#comment-16003626
 ] 

Yin Huai commented on SPARK-20311:
--

It introduced a regression 
(https://github.com/apache/spark/pull/17666#issuecomment-300309896). I have 
reverted the change.

> SQL "range(N) as alias" or "range(N) alias" doesn't work
> 
>
> Key: SPARK-20311
> URL: https://issues.apache.org/jira/browse/SPARK-20311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Juliusz Sompolski
>Assignee: Takeshi Yamamuro
>Priority: Minor
>
> `select * from range(10) as A;` or `select * from range(10) A;`
> does not work.
> As a workaround, a subquery has to be used:
> `select * from (select * from range(10)) as A;`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20311) SQL "range(N) as alias" or "range(N) alias" doesn't work

2017-05-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-20311:
-
Fix Version/s: (was: 2.2.1)
   (was: 2.3.0)

> SQL "range(N) as alias" or "range(N) alias" doesn't work
> 
>
> Key: SPARK-20311
> URL: https://issues.apache.org/jira/browse/SPARK-20311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Juliusz Sompolski
>Assignee: Takeshi Yamamuro
>Priority: Minor
>
> `select * from range(10) as A;` or `select * from range(10) A;`
> does not work.
> As a workaround, a subquery has to be used:
> `select * from (select * from range(10)) as A;`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20661) SparkR tableNames() test fails

2017-05-08 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-20661:


Assignee: Hossein Falaki

> SparkR tableNames() test fails
> --
>
> Key: SPARK-20661
> URL: https://issues.apache.org/jira/browse/SPARK-20661
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>Assignee: Hossein Falaki
>  Labels: test
> Fix For: 2.2.0
>
>
> Due to prior state created by other test cases, testing {{tableNames()}} is 
> failing in master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2846/console



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20661) SparkR tableNames() test fails

2017-05-08 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-20661.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17903
[https://github.com/apache/spark/pull/17903]

> SparkR tableNames() test fails
> --
>
> Key: SPARK-20661
> URL: https://issues.apache.org/jira/browse/SPARK-20661
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>  Labels: test
> Fix For: 2.2.0
>
>
> Due to prior state created by other test cases, testing {{tableNames()}} is 
> failing in master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2846/console



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20358) Executors failing stage on interrupted exception thrown by cancelled tasks

2017-04-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-20358:


Assignee: Eric Liang

> Executors failing stage on interrupted exception thrown by cancelled tasks
> --
>
> Key: SPARK-20358
> URL: https://issues.apache.org/jira/browse/SPARK-20358
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>Assignee: Eric Liang
> Fix For: 2.2.0
>
>
> https://issues.apache.org/jira/browse/SPARK-20217 introduced a regression 
> where now interrupted exceptions will cause a task to fail on cancellation. 
> This is because NonFatal(e) does not match if e is an InterrupedException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20358) Executors failing stage on interrupted exception thrown by cancelled tasks

2017-04-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-20358.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17659
[https://github.com/apache/spark/pull/17659]

> Executors failing stage on interrupted exception thrown by cancelled tasks
> --
>
> Key: SPARK-20358
> URL: https://issues.apache.org/jira/browse/SPARK-20358
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
> Fix For: 2.2.0
>
>
> https://issues.apache.org/jira/browse/SPARK-20217 introduced a regression 
> where now interrupted exceptions will cause a task to fail on cancellation. 
> This is because NonFatal(e) does not match if e is an InterrupedException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-20217:


Assignee: Eric Liang

> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
>Assignee: Eric Liang
> Fix For: 2.2.0
>
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-20217.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17531
[https://github.com/apache/spark/pull/17531]

> Executor should not fail stage if killed task throws non-interrupted exception
> --
>
> Key: SPARK-20217
> URL: https://issues.apache.org/jira/browse/SPARK-20217
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Eric Liang
> Fix For: 2.2.0
>
>
> This is reproducible as follows. Run the following, and then use 
> SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
> fail since we threw a RuntimeException instead of InterruptedException.
> We should probably unconditionally return TaskKilled instead of TaskFailed if 
> the task was killed by the driver, regardless of the actual exception thrown.
> {code}
> spark.range(100).repartition(100).foreach { i =>
>   try {
> Thread.sleep(1000)
>   } catch {
> case t: InterruptedException =>
>   throw new RuntimeException(t)
>   }
> }
> {code}
> Based on the code in TaskSetManager, I think this also affects kills of 
> speculative tasks. However, since the number of speculated tasks is few, and 
> usually you need to fail a task a few times before the stage is cancelled, 
> probably no-one noticed this in production.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14388) Create Table

2017-03-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932960#comment-15932960
 ] 

Yin Huai commented on SPARK-14388:
--

[~erlu] I see. Can you create a jira for this? Let's put an example in the 
description of that jira to explain the problem. Also, it will be great if you 
want to submit a pr to make the change :)

> Create Table
> 
>
> Key: SPARK-14388
> URL: https://issues.apache.org/jira/browse/SPARK-14388
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> For now, we still ask Hive to handle creating hive tables. We should handle 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19620) Incorrect exchange coordinator Id in physical plan

2017-03-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-19620:


Assignee: Carson Wang

> Incorrect exchange coordinator Id in physical plan
> --
>
> Key: SPARK-19620
> URL: https://issues.apache.org/jira/browse/SPARK-19620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carson Wang
>Assignee: Carson Wang
>Priority: Minor
> Fix For: 2.2.0
>
>
> When adaptive execution is enabled, an exchange coordinator is used to in the 
> Exchange operators. For Join, the same exchange coordinator is used for its 
> two Exchanges. But the physical plan shows two different coordinator Ids 
> which is confusing.
> Here is an example:
> {code}
> == Physical Plan ==
> *Project [key1#3L, value2#12L]
> +- *SortMergeJoin [key1#3L], [key2#11L], Inner
>:- *Sort [key1#3L ASC NULLS FIRST], false, 0
>:  +- Exchange(coordinator id: 1804587700) hashpartitioning(key1#3L, 10), 
> coordinator[target post-shuffle partition size: 67108864]
>: +- *Project [(id#0L % 500) AS key1#3L]
>:+- *Filter isnotnull((id#0L % 500))
>:   +- *Range (0, 1000, step=1, splits=Some(10))
>+- *Sort [key2#11L ASC NULLS FIRST], false, 0
>   +- Exchange(coordinator id: 793927319) hashpartitioning(key2#11L, 10), 
> coordinator[target post-shuffle partition size: 67108864]
>  +- *Project [(id#8L % 500) AS key2#11L, id#8L AS value2#12L]
> +- *Filter isnotnull((id#8L % 500))
>+- *Range (0, 1000, step=1, splits=Some(10))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19620) Incorrect exchange coordinator Id in physical plan

2017-03-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-19620.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16952
[https://github.com/apache/spark/pull/16952]

> Incorrect exchange coordinator Id in physical plan
> --
>
> Key: SPARK-19620
> URL: https://issues.apache.org/jira/browse/SPARK-19620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carson Wang
>Priority: Minor
> Fix For: 2.2.0
>
>
> When adaptive execution is enabled, an exchange coordinator is used to in the 
> Exchange operators. For Join, the same exchange coordinator is used for its 
> two Exchanges. But the physical plan shows two different coordinator Ids 
> which is confusing.
> Here is an example:
> {code}
> == Physical Plan ==
> *Project [key1#3L, value2#12L]
> +- *SortMergeJoin [key1#3L], [key2#11L], Inner
>:- *Sort [key1#3L ASC NULLS FIRST], false, 0
>:  +- Exchange(coordinator id: 1804587700) hashpartitioning(key1#3L, 10), 
> coordinator[target post-shuffle partition size: 67108864]
>: +- *Project [(id#0L % 500) AS key1#3L]
>:+- *Filter isnotnull((id#0L % 500))
>:   +- *Range (0, 1000, step=1, splits=Some(10))
>+- *Sort [key2#11L ASC NULLS FIRST], false, 0
>   +- Exchange(coordinator id: 793927319) hashpartitioning(key2#11L, 10), 
> coordinator[target post-shuffle partition size: 67108864]
>  +- *Project [(id#8L % 500) AS key2#11L, id#8L AS value2#12L]
> +- *Filter isnotnull((id#8L % 500))
>+- *Range (0, 1000, step=1, splits=Some(10))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19816) DataFrameCallbackSuite doesn't recover the log level

2017-03-03 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-19816:
-
Fix Version/s: 2.1.1

> DataFrameCallbackSuite doesn't recover the log level
> 
>
> Key: SPARK-19816
> URL: https://issues.apache.org/jira/browse/SPARK-19816
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.1.1, 2.2.0
>
>
> "DataFrameCallbackSuite.execute callback functions when a DataFrame action 
> failed" sets the log level to "fatal" but doesn't recover it. Hence, tests 
> running after it won't output any logs except fatal logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19604) Log the start of every Python test

2017-02-15 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869023#comment-15869023
 ] 

Yin Huai commented on SPARK-19604:
--

It has been resolved by https://github.com/apache/spark/pull/16935. 

> Log the start of every Python test
> --
>
> Key: SPARK-19604
> URL: https://issues.apache.org/jira/browse/SPARK-19604
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.3, 2.1.1
>
>
> Right now, we only have info level log after we finish the tests of a Python 
> test file. We should also log the start of a test. So, if a test is hanging, 
> we can tell which test file is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19604) Log the start of every Python test

2017-02-15 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-19604.
--
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.0.3

> Log the start of every Python test
> --
>
> Key: SPARK-19604
> URL: https://issues.apache.org/jira/browse/SPARK-19604
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.3, 2.1.1
>
>
> Right now, we only have info level log after we finish the tests of a Python 
> test file. We should also log the start of a test. So, if a test is hanging, 
> we can tell which test file is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19604) Log the start of every Python test

2017-02-14 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-19604:


Assignee: Yin Huai

> Log the start of every Python test
> --
>
> Key: SPARK-19604
> URL: https://issues.apache.org/jira/browse/SPARK-19604
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Right now, we only have info level log after we finish the tests of a Python 
> test file. We should also log the start of a test. So, if a test is hanging, 
> we can tell which test file is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19604) Log the start of every Python test

2017-02-14 Thread Yin Huai (JIRA)
Yin Huai created SPARK-19604:


 Summary: Log the start of every Python test
 Key: SPARK-19604
 URL: https://issues.apache.org/jira/browse/SPARK-19604
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 2.1.0
Reporter: Yin Huai


Right now, we only have info level log after we finish the tests of a Python 
test file. We should also log the start of a test. So, if a test is hanging, we 
can tell which test file is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19321) Support Hive 2.x's metastore

2017-01-20 Thread Yin Huai (JIRA)
Yin Huai created SPARK-19321:


 Summary: Support Hive 2.x's metastore
 Key: SPARK-19321
 URL: https://issues.apache.org/jira/browse/SPARK-19321
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai


It will be good to make Spark work with Hive 2.x's metastores. 

We need to add needed shim classes in 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala.
 Make IsolatedClientLoader recognize new versions of metastores 
(https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala).
 Finally, we want to add tests in 
https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19295) IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-19295.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16649
[https://github.com/apache/spark/pull/16649]

> IsolatedClientLoader's downloadVersion should log the location of downloaded 
> metastore client jars
> --
>
> Key: SPARK-19295
> URL: https://issues.apache.org/jira/browse/SPARK-19295
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Minor
> Fix For: 2.2.0
>
>
> When you set {{spark.sql.hive.metastore.jars}} to {{maven}}, spark will 
> download metastore client jars and their dependencies. It will be good to log 
> the location of those downloaded jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19295) IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-19295:
-
Priority: Minor  (was: Major)

> IsolatedClientLoader's downloadVersion should log the location of downloaded 
> metastore client jars
> --
>
> Key: SPARK-19295
> URL: https://issues.apache.org/jira/browse/SPARK-19295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Minor
>
> When you set {{spark.sql.hive.metastore.jars}} to {{maven}}, spark will 
> download metastore client jars and their dependencies. It will be good to log 
> the location of those downloaded jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19295) IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-19295:
-
Issue Type: Improvement  (was: Bug)

> IsolatedClientLoader's downloadVersion should log the location of downloaded 
> metastore client jars
> --
>
> Key: SPARK-19295
> URL: https://issues.apache.org/jira/browse/SPARK-19295
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Minor
>
> When you set {{spark.sql.hive.metastore.jars}} to {{maven}}, spark will 
> download metastore client jars and their dependencies. It will be good to log 
> the location of those downloaded jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19295) IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread Yin Huai (JIRA)
Yin Huai created SPARK-19295:


 Summary: IsolatedClientLoader's downloadVersion should log the 
location of downloaded metastore client jars
 Key: SPARK-19295
 URL: https://issues.apache.org/jira/browse/SPARK-19295
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai
Assignee: Yin Huai


When you set {{spark.sql.hive.metastore.jars}} to {{maven}}, spark will 
download metastore client jars and their dependencies. It will be good to log 
the location of those downloaded jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18885) unify CREATE TABLE syntax for data source and hive serde tables

2017-01-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18885.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16296
[https://github.com/apache/spark/pull/16296]

> unify CREATE TABLE syntax for data source and hive serde tables
> ---
>
> Key: SPARK-18885
> URL: https://issues.apache.org/jira/browse/SPARK-18885
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
> Attachments: CREATE-TABLE.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19072) Catalyst's IN always returns false for infinity

2017-01-03 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-19072.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16469
[https://github.com/apache/spark/pull/16469]

> Catalyst's IN always returns false for infinity
> ---
>
> Key: SPARK-19072
> URL: https://issues.apache.org/jira/browse/SPARK-19072
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Kay Ousterhout
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>
> This bug was caused by the fix for SPARK-18999 
> (https://github.com/apache/spark/pull/16402)
> This can be reproduced by adding the following test to PredicateSuite.scala 
> (which will consistently fail):
> val value = NonFoldableLiteral(Double.PositiveInfinity, DoubleType)
> checkEvaluation(In(value, List(value)), true)
> This bug is causing 
> org.apache.spark.sql.catalyst.expressions.PredicateSuite.IN to fail 
> approximately 10% of the time (it fails anytime the value is Infinity or 
> -Infinity and the correct answer is True -- e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70826/testReport/org.apache.spark.sql.catalyst.expressions/PredicateSuite/IN/,
>  
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70830/console).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19072) Catalyst's IN always returns false for infinity

2017-01-03 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-19072:
-
Assignee: Wenchen Fan

> Catalyst's IN always returns false for infinity
> ---
>
> Key: SPARK-19072
> URL: https://issues.apache.org/jira/browse/SPARK-19072
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Kay Ousterhout
>Assignee: Wenchen Fan
>
> This bug was caused by the fix for SPARK-18999 
> (https://github.com/apache/spark/pull/16402)
> This can be reproduced by adding the following test to PredicateSuite.scala 
> (which will consistently fail):
> val value = NonFoldableLiteral(Double.PositiveInfinity, DoubleType)
> checkEvaluation(In(value, List(value)), true)
> This bug is causing 
> org.apache.spark.sql.catalyst.expressions.PredicateSuite.IN to fail 
> approximately 10% of the time (it fails anytime the value is Infinity or 
> -Infinity and the correct answer is True -- e.g., 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70826/testReport/org.apache.spark.sql.catalyst.expressions/PredicateSuite/IN/,
>  
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70830/console).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18567) Simplify CreateDataSourceTableAsSelectCommand

2016-12-28 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18567.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 15996
[https://github.com/apache/spark/pull/15996]

> Simplify CreateDataSourceTableAsSelectCommand
> -
>
> Key: SPARK-18567
> URL: https://issues.apache.org/jira/browse/SPARK-18567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16552) Store the Inferred Schemas into External Catalog Tables when Creating Tables

2016-12-28 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783705#comment-15783705
 ] 

Yin Huai commented on SPARK-16552:
--

[~smilegator] [~cloud_fan] i think we will not do partitioning discovery after 
SPARK-17861 by default right? Can you help me check if we still need to write 
anything about this in the release notes?

> Store the Inferred Schemas into External Catalog Tables when Creating Tables
> 
>
> Key: SPARK-16552
> URL: https://issues.apache.org/jira/browse/SPARK-16552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>  Labels: release_notes, releasenotes
> Fix For: 2.1.0
>
>
> Currently, in Spark SQL, the initial creation of schema can be classified 
> into two groups. It is applicable to both Hive tables and Data Source tables:
> Group A. Users specify the schema. 
> Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema 
> of the SELECT clause. For example,
> {noformat}
> CREATE TABLE tab STORED AS TEXTFILE
> AS SELECT * from input
> {noformat}
> Case 2 CREATE TABLE: users explicitly specify the schema. For example,
> {noformat}
> CREATE TABLE jsonTable (_1 string, _2 string)
> USING org.apache.spark.sql.json
> {noformat}
> Group B. Spark SQL infer the schema at runtime.
> Case 3 CREATE TABLE. Users do not specify the schema but the path to the file 
> location. For example,
> {noformat}
> CREATE TABLE jsonTable 
> USING org.apache.spark.sql.json
> OPTIONS (path '${tempDir.getCanonicalPath}')
> {noformat}
> Now, Spark SQL does not store the inferred schema in the external catalog for 
> the cases in Group B. When users refreshing the metadata cache, accessing the 
> table at the first time after (re-)starting Spark, Spark SQL will infer the 
> schema and store the info in the metadata cache for improving the performance 
> of subsequent metadata requests. However, the runtime schema inference could 
> cause undesirable schema changes after each reboot of Spark.
> It is desirable to store the inferred schema in the external catalog when 
> creating the table. When users intend to refresh the schema, they issue 
> `REFRESH TABLE`. Spark SQL will infer the schema again based on the 
> previously specified table location and update/refresh the schema in the 
> external catalog and metadata cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18990:
-
Fix Version/s: (was: 2.2.0)

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-18990:
--

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18951) Upgrade com.thoughtworks.paranamer/paranamer to 2.6

2016-12-21 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18951.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16359
[https://github.com/apache/spark/pull/16359]

> Upgrade com.thoughtworks.paranamer/paranamer to 2.6
> ---
>
> Key: SPARK-18951
> URL: https://issues.apache.org/jira/browse/SPARK-18951
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.2.0
>
>
> I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
> jackson fail to handle byte array defined in a case class. Then I find 
> https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
> that it is caused by a bug in paranamer. Let's upgrade paranamer. 
> Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
> com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade 
> paranamer to 2.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18928) FileScanRDD, JDBCRDD, and UnsafeSorter should support task cancellation

2016-12-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18928:
-
Fix Version/s: 2.0.3

> FileScanRDD, JDBCRDD, and UnsafeSorter should support task cancellation
> ---
>
> Key: SPARK-18928
> URL: https://issues.apache.org/jira/browse/SPARK-18928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> Spark tasks respond to cancellation by checking 
> {{TaskContext.isInterrupted()}}, but this check is missing on a few critical 
> paths used in Spark SQL, including FileScanRDD, JDBCRDD, and 
> UnsafeSorter-based sorts. This can cause interrupted / cancelled tasks to 
> continue running and become zombies.
> Here's an example: first, create a giant text file. In my case, I just 
> concatenated /usr/share/dict/words a bunch of times to produce a 2.75 gig 
> file. Then, run a really slow query over that file and try to cancel it:
> {code}
> spark.read.text("/tmp/words").selectExpr("value + value + value").collect()
> {code}
> This will sit and churn at 100% CPU for a minute or two because the task 
> isn't checking the interrupted flag.
> The solution here is to add InterruptedIterator-style checks to a few 
> locations where they're currently missing in Spark SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18761) Uncancellable / unkillable tasks may starve jobs of resoures

2016-12-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18761:
-
Fix Version/s: 2.1.1
   2.0.3

> Uncancellable / unkillable tasks may starve jobs of resoures
> 
>
> Key: SPARK-18761
> URL: https://issues.apache.org/jira/browse/SPARK-18761
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> Spark's current task cancellation / task killing mechanism is "best effort" 
> in the sense that some tasks may not be interruptible and may not respond to 
> their "killed" flags being set. If a significant fraction of a cluster's task 
> slots are occupied by tasks that have been marked as killed but remain 
> running then this can lead to a situation where new jobs and tasks are 
> starved of resources because zombie tasks are holding resources.
> I propose to address this problem by introducing a "task reaper" mechanism in 
> executors to monitor tasks after they are marked for killing in order to 
> periodically re-attempt the task kill, capture and log stacktraces / warnings 
> if tasks do not exit in a timely manner, and, optionally, kill the entire 
> executor JVM if cancelled tasks cannot be killed within some timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18953) Do not show the link to a dead worker on the master page

2016-12-20 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18953:


 Summary: Do not show the link to a dead worker on the master page
 Key: SPARK-18953
 URL: https://issues.apache.org/jira/browse/SPARK-18953
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Yin Huai


The master page seems still show links to dead workers. For a dead worker, we 
will not be able to see its worker page anyway. Seems makes sense to not show 
links to dead workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18951) Upgrade com.thoughtworks.paranamer/paranamer to 2.6

2016-12-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18951:
-
Description: 
I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
jackson fail to handle byte array defined in a case class. Then I find 
https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
that it is caused by a bug in paranamer. Let's upgrade paranamer. 

Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade paranamer 
to 2.6. 

  was:
I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
jackson fail to handle byte array defined in a case class. Then I find 
https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
that it is caused by a bug in paranamer. Let's upgrade paranamer. 

Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
com.thoughtworks.paranamer/paranamer uses 2.6, I suggests that we upgrade 
paranamer to 2.6. 


> Upgrade com.thoughtworks.paranamer/paranamer to 2.6
> ---
>
> Key: SPARK-18951
> URL: https://issues.apache.org/jira/browse/SPARK-18951
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
> jackson fail to handle byte array defined in a case class. Then I find 
> https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
> that it is caused by a bug in paranamer. Let's upgrade paranamer. 
> Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
> com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade 
> paranamer to 2.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18951) Upgrade com.thoughtworks.paranamer/paranamer to 2.6

2016-12-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-18951:


Assignee: Yin Huai

> Upgrade com.thoughtworks.paranamer/paranamer to 2.6
> ---
>
> Key: SPARK-18951
> URL: https://issues.apache.org/jira/browse/SPARK-18951
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
> jackson fail to handle byte array defined in a case class. Then I find 
> https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
> that it is caused by a bug in paranamer. Let's upgrade paranamer. 
> Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
> com.thoughtworks.paranamer/paranamer uses 2.6, I suggests that we upgrade 
> paranamer to 2.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18951) Upgrade com.thoughtworks.paranamer/paranamer

2016-12-20 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18951:


 Summary: Upgrade com.thoughtworks.paranamer/paranamer
 Key: SPARK-18951
 URL: https://issues.apache.org/jira/browse/SPARK-18951
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Yin Huai


I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
jackson fail to handle byte array defined in a case class. Then I find 
https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
that it is caused by a bug in paranamer. Let's upgrade paranamer. 

Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
com.thoughtworks.paranamer/paranamer uses 2.6, I suggests that we upgrade 
paranamer to 2.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18951) Upgrade com.thoughtworks.paranamer/paranamer to 2.6

2016-12-20 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18951:
-
Summary: Upgrade com.thoughtworks.paranamer/paranamer to 2.6  (was: Upgrade 
com.thoughtworks.paranamer/paranamer)

> Upgrade com.thoughtworks.paranamer/paranamer to 2.6
> ---
>
> Key: SPARK-18951
> URL: https://issues.apache.org/jira/browse/SPARK-18951
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yin Huai
>
> I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes 
> jackson fail to handle byte array defined in a case class. Then I find 
> https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests 
> that it is caused by a bug in paranamer. Let's upgrade paranamer. 
> Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use 
> com.thoughtworks.paranamer/paranamer uses 2.6, I suggests that we upgrade 
> paranamer to 2.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18761) Uncancellable / unkillable tasks may starve jobs of resoures

2016-12-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18761.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16189
[https://github.com/apache/spark/pull/16189]

> Uncancellable / unkillable tasks may starve jobs of resoures
> 
>
> Key: SPARK-18761
> URL: https://issues.apache.org/jira/browse/SPARK-18761
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.2.0
>
>
> Spark's current task cancellation / task killing mechanism is "best effort" 
> in the sense that some tasks may not be interruptible and may not respond to 
> their "killed" flags being set. If a significant fraction of a cluster's task 
> slots are occupied by tasks that have been marked as killed but remain 
> running then this can lead to a situation where new jobs and tasks are 
> starved of resources because zombie tasks are holding resources.
> I propose to address this problem by introducing a "task reaper" mechanism in 
> executors to monitor tasks after they are marked for killing in order to 
> periodically re-attempt the task kill, capture and log stacktraces / warnings 
> if tasks do not exit in a timely manner, and, optionally, kill the entire 
> executor JVM if cancelled tasks cannot be killed within some timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18921) check database existence with Hive.databaseExists instead of getDatabase

2016-12-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18921.
--
   Resolution: Fixed
Fix Version/s: 2.1.1

Issue resolved by pull request 16332
[https://github.com/apache/spark/pull/16332]

> check database existence with Hive.databaseExists instead of getDatabase
> 
>
> Key: SPARK-18921
> URL: https://issues.apache.org/jira/browse/SPARK-18921
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2016-12-13 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13747.
--
   Resolution: Fixed
Fix Version/s: (was: 2.0.2)
   (was: 2.1.0)
   2.2.0

Issue resolved by pull request 16230
[https://github.com/apache/spark/pull/16230]

> Concurrent execution in SQL doesn't work with Scala ForkJoinPool
> 
>
> Key: SPARK-13747
> URL: https://issues.apache.org/jira/browse/SPARK-13747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.2.0
>
>
> Run the following codes may fail
> {code}
> (1 to 100).par.foreach { _ =>
>   println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count())
> }
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> This is because SparkContext.runJob can be suspended when using a 
> ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it 
> calls Await.ready (introduced by https://github.com/apache/spark/pull/9264).
> So when SparkContext.runJob is suspended, ForkJoinPool will run another task 
> in the same thread, however, the local properties has been polluted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-13 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18675.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16104
[https://github.com/apache/spark/pull/16104]

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-12 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18816:
-
Assignee: Alex Bozarth

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Assignee: Alex Bozarth
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-12 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743298#comment-15743298
 ] 

Yin Huai commented on SPARK-18816:
--

Yea, log pages are still there. But, without those links on the executor page, 
it is very hard to find those pages. 

btw, is there any place that we should look at to find the cause of this 
problem?

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-12 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18816:
-
Priority: Blocker  (was: Major)

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-12 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743260#comment-15743260
 ] 

Yin Huai commented on SPARK-18816:
--

[~ajbozarth] Yea, please take a look. Thanks! 

The reasons that I set it as a blocker are (1) those log links are super 
important for debugging; and (2) it is a regression from 2.0.

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-09 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737254#comment-15737254
 ] 

Yin Huai commented on SPARK-18816:
--

btw, my testing was done with chrome.


I then terminated the cluster and started a new one. I first launched workers. 
Then, I still could not see the log links on the page. But, I can see the links 
from safari. 

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18816:
-
Attachment: screenshot-1.png

> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-09 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18816:


 Summary: executor page fails to show log links if executors are 
added after an app is launched
 Key: SPARK-18816
 URL: https://issues.apache.org/jira/browse/SPARK-18816
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Yin Huai
Priority: Blocker
 Attachments: screenshot-1.png

How to reproduce with standalone mode:
1. Launch a spark master
2. Launch a spark shell. At this point, there is no executor associated with 
this application. 
3. Launch a slave. Now, there is an executor assigned to the spark shell. 
However, there is no link to stdout/stderr on the executor page.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18816) executor page fails to show log links if executors are added after an app is launched

2016-12-09 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18816:
-
Description: 
How to reproduce with standalone mode:
1. Launch a spark master
2. Launch a spark shell. At this point, there is no executor associated with 
this application. 
3. Launch a slave. Now, there is an executor assigned to the spark shell. 
However, there is no link to stdout/stderr on the executor page (please see 
https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



  was:
How to reproduce with standalone mode:
1. Launch a spark master
2. Launch a spark shell. At this point, there is no executor associated with 
this application. 
3. Launch a slave. Now, there is an executor assigned to the spark shell. 
However, there is no link to stdout/stderr on the executor page.




> executor page fails to show log links if executors are added after an app is 
> launched
> -
>
> Key: SPARK-18816
> URL: https://issues.apache.org/jira/browse/SPARK-18816
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Yin Huai
>Priority: Blocker
> Attachments: screenshot-1.png
>
>
> How to reproduce with standalone mode:
> 1. Launch a spark master
> 2. Launch a spark shell. At this point, there is no executor associated with 
> this application. 
> 3. Launch a slave. Now, there is an executor assigned to the spark shell. 
> However, there is no link to stdout/stderr on the executor page (please see 
> https://issues.apache.org/jira/secure/attachment/12842649/screenshot-1.png).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18284) Scheme of DataFrame generated from RDD is diffrent between master and 2.0

2016-12-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723012#comment-15723012
 ] 

Yin Huai commented on SPARK-18284:
--

[~kiszk] btw, do we know what caused the nullable setting change in 2.1?

> Scheme of DataFrame generated from RDD is diffrent between master and 2.0
> -
>
> Key: SPARK-18284
> URL: https://issues.apache.org/jira/browse/SPARK-18284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0
>
>
> When the following program is executed, a schema of dataframe is different 
> among master, branch 2.0, and branch 2.1. The result should be false.
> {code:java}
> val df = sparkContext.parallelize(1 to 8, 1).toDF()
> df.printSchema
> df.filter("value > 4").count
> === master ===
> root
>  |-- value: integer (nullable = true)
> === branch 2.1 ===
> root
>  |-- value: integer (nullable = true)
> === branch 2.0 ===
> root
>  |-- value: integer (nullable = false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18660) Parquet complains "Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

2016-11-30 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18660:


 Summary: Parquet complains "Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl "
 Key: SPARK-18660
 URL: https://issues.apache.org/jira/browse/SPARK-18660
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai


Parquet record reader always complain "Can not initialize counter due to 
context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl". Looks like we always 
create TaskAttemptContextImpl 
(https://github.com/apache/spark/blob/2f7461f31331cfc37f6cfa3586b7bbefb3af5547/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L368).
 But, Parquet wants to use TaskInputOutputContext, which is a subclass of 
TaskAttemptContextImpl. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18631) Avoid making data skew worse in ExchangeCoordinator

2016-11-29 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18631.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16065
[https://github.com/apache/spark/pull/16065]

> Avoid making data skew worse in ExchangeCoordinator
> ---
>
> Key: SPARK-18631
> URL: https://issues.apache.org/jira/browse/SPARK-18631
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.0
>Reporter: Mark Hamstra
>Assignee: Mark Hamstra
> Fix For: 2.2.0
>
>
> The logic to resize partitions in the ExchangeCoordinator is to not start a 
> new partition until the targetPostShuffleInputSize is equalled or exceeded.  
> This can make data skew problems worse since a number of small partitions can 
> first be combined as long as the combined size remains smaller than the 
> targetPostShuffleInputSize, and then a large, data-skewed partition can be 
> further combined, making it even bigger than it already was.
> It's a fairly simple to change the logic to create a new partition if adding 
> a new piece would exceed the targetPostShuffleInputSize instead of only 
> creating a new partition after the targetPostShuffleInputSize has already 
> been exceeded.  This results in a few more partitions being created by the 
> ExchangeCoordinator, but data skew problems are at least not made worse even 
> though they are not made any better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18468) Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet relation with decimal column

2016-11-29 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18468:
-
Target Version/s:   (was: 2.1.0)

> Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist 
> Parquet relation with decimal column
> --
>
> Key: SPARK-18468
> URL: https://issues.apache.org/jira/browse/SPARK-18468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Priority: Critical
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.1-test-sbt-hadoop-2.4/71/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_9757_Persist_Parquet_relation_with_decimal_column/
> https://spark-tests.appspot.com/builds/spark-branch-2.1-test-sbt-hadoop-2.4/71
> Seems we failed to stop the driver
> {code}
> 2016-11-15 18:36:47.76 - stderr> org.apache.spark.rpc.RpcTimeoutException: 
> Cannot receive any reply in 120 seconds. This timeout is controlled by 
> spark.rpc.askTimeout
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> 2016-11-15 18:36:47.76 - stderr>  at scala.util.Try$.apply(Try.scala:192)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.util.Failure.recover(Try.scala:216)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> 2016-11-15 18:36:47.76 - stderr>  at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Promise$class.complete(Promise.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Promise$class.t

[jira] [Assigned] (SPARK-18602) Dependency list still shows that the version of org.codehaus.janino:commons-compiler is 2.7.6

2016-11-28 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-18602:


Assignee: Yin Huai

> Dependency list still shows that the version of 
> org.codehaus.janino:commons-compiler is 2.7.6
> -
>
> Key: SPARK-18602
> URL: https://issues.apache.org/jira/browse/SPARK-18602
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.1.0
>
>
> org.codehaus.janino:janino:3.0.0 depends on 
> org.codehaus.janino:commons-compiler:3.0.0.
> However, 
> https://github.com/apache/spark/blob/branch-2.1/dev/deps/spark-deps-hadoop-2.7
>  still shows that commons-compiler from janino is 2.7.6. This is probably 
> because hive module depends on calcite-core, which depends on 
> commons-compiler 2.7.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18602) Dependency list still shows that the version of org.codehaus.janino:commons-compiler is 2.7.6

2016-11-28 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18602.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 16025
[https://github.com/apache/spark/pull/16025]

> Dependency list still shows that the version of 
> org.codehaus.janino:commons-compiler is 2.7.6
> -
>
> Key: SPARK-18602
> URL: https://issues.apache.org/jira/browse/SPARK-18602
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.1.0
>Reporter: Yin Huai
> Fix For: 2.1.0
>
>
> org.codehaus.janino:janino:3.0.0 depends on 
> org.codehaus.janino:commons-compiler:3.0.0.
> However, 
> https://github.com/apache/spark/blob/branch-2.1/dev/deps/spark-deps-hadoop-2.7
>  still shows that commons-compiler from janino is 2.7.6. This is probably 
> because hive module depends on calcite-core, which depends on 
> commons-compiler 2.7.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18602) Dependency list still shows that the version of org.codehaus.janino:commons-compiler is 2.7.6

2016-11-27 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18602:


 Summary: Dependency list still shows that the version of 
org.codehaus.janino:commons-compiler is 2.7.6
 Key: SPARK-18602
 URL: https://issues.apache.org/jira/browse/SPARK-18602
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Affects Versions: 2.1.0
Reporter: Yin Huai


org.codehaus.janino:janino:3.0.0 depends on 
org.codehaus.janino:commons-compiler:3.0.0.

However, 
https://github.com/apache/spark/blob/branch-2.1/dev/deps/spark-deps-hadoop-2.7 
still shows that commons-compiler from janino is 2.7.6. This is probably 
because hive module depends on calcite-core, which depends on commons-compiler 
2.7.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

2016-11-22 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18544:
-
Priority: Blocker  (was: Major)

> Append with df.saveAsTable writes data to wrong location
> 
>
> Key: SPARK-18544
> URL: https://issues.apache.org/jira/browse/SPARK-18544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Priority: Blocker
>
> When using saveAsTable in append mode, data will be written to the wrong 
> location for non-managed Datasource tables. The following example illustrates 
> this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from 
> DataFrameWriter. Also, we should probably remove the repair table call at the 
> end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the 
> Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(1).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test_10k")
> scala> sql("msck repair table test_10k")
> scala> sql("select * from test_10k where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("append").parquet("/tmp/test_10k")
> scala> sql("select * from test_10k where A = 1").count
> res8: Long = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15513) Bzip2Factory in Hadoop 2.7.1 is not thread safe

2016-11-21 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-15513.
--
Resolution: Won't Fix

> Bzip2Factory in Hadoop 2.7.1 is not thread safe
> ---
>
> Key: SPARK-15513
> URL: https://issues.apache.org/jira/browse/SPARK-15513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
> Environment: Hadoop 2.7.1
>Reporter: Yin Huai
>
> This is caused by https://issues.apache.org/jira/browse/HADOOP-12191. When we 
> are loading the native bzip2 lib by one thread, other threads think that 
> native bzip2 lib is not available and then throws exceptions. 
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 
> (TID 37, localhost): java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:65)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:81)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:102)
>   at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:95)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1205)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   Suppressed: java.lang.UnsupportedOperationException
>   at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:89)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
>   at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
>   at 
> org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1296)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
>   at 
> org.apache.spark.schedul

[jira] [Commented] (SPARK-15513) Bzip2Factory in Hadoop 2.7.1 is not thread safe

2016-11-21 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684302#comment-15684302
 ] 

Yin Huai commented on SPARK-15513:
--

I am closing this jira since the fix has been released with 2.7.2.

> Bzip2Factory in Hadoop 2.7.1 is not thread safe
> ---
>
> Key: SPARK-15513
> URL: https://issues.apache.org/jira/browse/SPARK-15513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
> Environment: Hadoop 2.7.1
>Reporter: Yin Huai
>
> This is caused by https://issues.apache.org/jira/browse/HADOOP-12191. When we 
> are loading the native bzip2 lib by one thread, other threads think that 
> native bzip2 lib is not available and then throws exceptions. 
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 
> (TID 37, localhost): java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:65)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:81)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:102)
>   at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:95)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1205)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   Suppressed: java.lang.UnsupportedOperationException
>   at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:89)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
>   at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
>   at 
> org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1296)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGSche

[jira] [Comment Edited] (SPARK-15513) Bzip2Factory in Hadoop 2.7.1 is not thread safe

2016-11-21 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684302#comment-15684302
 ] 

Yin Huai edited comment on SPARK-15513 at 11/21/16 6:17 PM:


I am closing this jira since the fix has been released with hadoop 2.7.2.


was (Author: yhuai):
I am closing this jira since the fix has been released with 2.7.2.

> Bzip2Factory in Hadoop 2.7.1 is not thread safe
> ---
>
> Key: SPARK-15513
> URL: https://issues.apache.org/jira/browse/SPARK-15513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
> Environment: Hadoop 2.7.1
>Reporter: Yin Huai
>
> This is caused by https://issues.apache.org/jira/browse/HADOOP-12191. When we 
> are loading the native bzip2 lib by one thread, other threads think that 
> native bzip2 lib is not available and then throws exceptions. 
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 
> (TID 37, localhost): java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:65)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:81)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:102)
>   at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:95)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1205)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1190)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   Suppressed: java.lang.UnsupportedOperationException
>   at 
> org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:89)
>   at 
> org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
>   at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
>   at 
> org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1211)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1296)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at scala.Option.foreach(Option.scala:236

[jira] [Resolved] (SPARK-18360) default table path of tables in default database should depend on the location of default database

2016-11-17 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18360.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 15812
[https://github.com/apache/spark/pull/15812]

> default table path of tables in default database should depend on the 
> location of default database
> --
>
> Key: SPARK-18360
> URL: https://issues.apache.org/jira/browse/SPARK-18360
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>  Labels: release_notes, releasenotes
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18360) default table path of tables in default database should depend on the location of default database

2016-11-17 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18360:
-
Labels: release_notes releasenotes  (was: )

> default table path of tables in default database should depend on the 
> location of default database
> --
>
> Key: SPARK-18360
> URL: https://issues.apache.org/jira/browse/SPARK-18360
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>  Labels: release_notes, releasenotes
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18468) Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet relation with decimal column

2016-11-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18468:
-
Description: 
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.1-test-sbt-hadoop-2.4/71/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_9757_Persist_Parquet_relation_with_decimal_column/

https://spark-tests.appspot.com/builds/spark-branch-2.1-test-sbt-hadoop-2.4/71

Seems we failed to stop the driver
{code}
2016-11-15 18:36:47.76 - stderr> org.apache.spark.rpc.RpcTimeoutException: 
Cannot receive any reply in 120 seconds. This timeout is controlled by 
spark.rpc.askTimeout
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
2016-11-15 18:36:47.76 - stderr>at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
2016-11-15 18:36:47.76 - stderr>at 
scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
2016-11-15 18:36:47.76 - stderr>at scala.util.Try$.apply(Try.scala:192)
2016-11-15 18:36:47.76 - stderr>at 
scala.util.Failure.recover(Try.scala:216)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
2016-11-15 18:36:47.76 - stderr>at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Promise$class.complete(Promise.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
2016-11-15 18:36:47.76 - stderr>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2016-11-15 18:36:47.76 - stderr>at 
java.util.concurrent.FutureTask.run(Futu

[jira] [Updated] (SPARK-18468) Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet relation with decimal column

2016-11-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18468:
-
Component/s: (was: SQL)
 Spark Core

> Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist 
> Parquet relation with decimal column
> --
>
> Key: SPARK-18468
> URL: https://issues.apache.org/jira/browse/SPARK-18468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Priority: Critical
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.1-test-sbt-hadoop-2.4/71/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_9757_Persist_Parquet_relation_with_decimal_column/
> Seems we failed to stop the driver
> {code}
> 2016-11-15 18:36:47.76 - stderr> org.apache.spark.rpc.RpcTimeoutException: 
> Cannot receive any reply in 120 seconds. This timeout is controlled by 
> spark.rpc.askTimeout
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
> 2016-11-15 18:36:47.76 - stderr>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> 2016-11-15 18:36:47.76 - stderr>  at scala.util.Try$.apply(Try.scala:192)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.util.Failure.recover(Try.scala:216)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> 2016-11-15 18:36:47.76 - stderr>  at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Promise$class.complete(Promise.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> 2016-11-15 18:36:47.76 - stderr>  at 
> scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> 2016-11-15 18:36:47.76 - st

[jira] [Resolved] (SPARK-18186) Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support

2016-11-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18186.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 15703
[https://github.com/apache/spark/pull/15703]

> Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation 
> support
> 
>
> Key: SPARK-18186
> URL: https://issues.apache.org/jira/browse/SPARK-18186
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2, 2.0.1
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.2.0
>
>
> Currently, Hive UDAFs in Spark SQL don't support partial aggregation. Any 
> query involving any Hive UDAFs has to fall back to {{SortAggregateExec}} 
> without partial aggregation.
> This issue can be fixed by migrating {{HiveUDAFFunction}} to 
> {{TypedImperativeAggregate}}, which already provides partial aggregation 
> support for aggregate functions that may use arbitrary Java objects as 
> aggregation states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18468) Flaky test: org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet relation with decimal column

2016-11-15 Thread Yin Huai (JIRA)
Yin Huai created SPARK-18468:


 Summary: Flaky test: 
org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-9757 Persist Parquet 
relation with decimal column
 Key: SPARK-18468
 URL: https://issues.apache.org/jira/browse/SPARK-18468
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Yin Huai
Priority: Critical


https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.1-test-sbt-hadoop-2.4/71/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/SPARK_9757_Persist_Parquet_relation_with_decimal_column/

Seems we failed to stop the driver
{code}
2016-11-15 18:36:47.76 - stderr> org.apache.spark.rpc.RpcTimeoutException: 
Cannot receive any reply in 120 seconds. This timeout is controlled by 
spark.rpc.askTimeout
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
2016-11-15 18:36:47.76 - stderr>at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
2016-11-15 18:36:47.76 - stderr>at 
scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
2016-11-15 18:36:47.76 - stderr>at scala.util.Try$.apply(Try.scala:192)
2016-11-15 18:36:47.76 - stderr>at 
scala.util.Failure.recover(Try.scala:216)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
2016-11-15 18:36:47.76 - stderr>at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Promise$class.complete(Promise.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
2016-11-15 18:36:47.76 - stderr>at 
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
2016-11-15 18:36:47.76 - stderr>at 
org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
2016-1

[jira] [Commented] (SPARK-18464) Spark SQL fails to load tables created without providing a schema

2016-11-15 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669057#comment-15669057
 ] 

Yin Huai commented on SPARK-18464:
--

cc [~cloud_fan]

> Spark SQL fails to load tables created without providing a schema
> -
>
> Key: SPARK-18464
> URL: https://issues.apache.org/jira/browse/SPARK-18464
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yin Huai
>Priority: Blocker
>
> I have a old table that was created without providing a schema. Seems branch 
> 2.1 fail to load it and says that the schema is corrupt. 
> With {{spark.sql.debug}} enabled, I get the metadata by using {{describe 
> formatted}}.
> {code}
> [col,array,from deserializer]
> [,,]
> [# Detailed Table Information,,]
> [Database:,mydb,]
> [Owner:,root,]
> [Create Time:,Fri Jun 17 11:55:07 UTC 2016,]
> [Last Access Time:,Thu Jan 01 00:00:00 UTC 1970,]
> [Location:,mylocation,]
> [Table Type:,EXTERNAL,]
> [Table Parameters:,,]
> [  transient_lastDdlTime,1466164507,]
> [  spark.sql.sources.provider,parquet,]
> [,,]
> [# Storage Information,,]
> [SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
> [InputFormat:,org.apache.hadoop.mapred.SequenceFileInputFormat,]
> [OutputFormat:,org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,]
> [Compressed:,No,]
> [Storage Desc Parameters:,,]
> [  path,/myPatch,]
> [  serialization.format,1,]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >