[jira] [Created] (SPARK-16946) saveAsTable[append] with different number of columns should throw Exception

2016-08-07 Thread Huaxin Gao (JIRA)
Huaxin Gao created SPARK-16946:
--

 Summary: saveAsTable[append] with different number of columns 
should throw Exception
 Key: SPARK-16946
 URL: https://issues.apache.org/jira/browse/SPARK-16946
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Huaxin Gao
Priority: Minor


In HiveContext, if saveAsTable[append] has different number of columns, Spark 
will throw Exception. 
e.g.
{code}
test("saveAsTable[append]: too many columns") {
  withTable("saveAsTable_too_many_columns") {
Seq((1, 2)).toDF("i", 
"j").write.saveAsTable("saveAsTable_too_many_columns")
val e = intercept[AnalysisException] {
  Seq((3, 4, 5)).toDF("i", "j", 
"k").write.mode("append").saveAsTable("saveAsTable_too_many_columns")
}
assert(e.getMessage.contains("doesn't match"))
  }
}
{code}

However, in SparkSession or SQLContext, if use the above code example, the 
extra column in the append data will be removed silently without any warning or 
Exception.  The table becomes
ij
3  4
1  2
We may want follow the HiveContext behavior and throw Exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16606) Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an existing SparkContext, some configuration may not take effect."

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411328#comment-15411328
 ] 

Apache Spark commented on SPARK-16606:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/14533

> Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an 
> existing SparkContext, some configuration may not take effect."
> --
>
> Key: SPARK-16606
> URL: https://issues.apache.org/jira/browse/SPARK-16606
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {{SparkContext.getOrCreate}} should really be checking whether the code gets 
> the already-created instance or creating a new one.
> Just a nit-pick: the warning message should also be "Using..." not "Use"
> {code}
> scala> sc.version
> res2: String = 2.1.0-SNAPSHOT
> scala> sc
> res3: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> scala> SparkContext.getOrCreate
> 16/07/18 14:40:31 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> res4: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16606) Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an existing SparkContext, some configuration may not take effect."

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16606:


Assignee: Apache Spark

> Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an 
> existing SparkContext, some configuration may not take effect."
> --
>
> Key: SPARK-16606
> URL: https://issues.apache.org/jira/browse/SPARK-16606
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Minor
>
> {{SparkContext.getOrCreate}} should really be checking whether the code gets 
> the already-created instance or creating a new one.
> Just a nit-pick: the warning message should also be "Using..." not "Use"
> {code}
> scala> sc.version
> res2: String = 2.1.0-SNAPSHOT
> scala> sc
> res3: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> scala> SparkContext.getOrCreate
> 16/07/18 14:40:31 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> res4: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16606) Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an existing SparkContext, some configuration may not take effect."

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16606:


Assignee: (was: Apache Spark)

> Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an 
> existing SparkContext, some configuration may not take effect."
> --
>
> Key: SPARK-16606
> URL: https://issues.apache.org/jira/browse/SPARK-16606
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {{SparkContext.getOrCreate}} should really be checking whether the code gets 
> the already-created instance or creating a new one.
> Just a nit-pick: the warning message should also be "Using..." not "Use"
> {code}
> scala> sc.version
> res2: String = 2.1.0-SNAPSHOT
> scala> sc
> res3: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> scala> SparkContext.getOrCreate
> 16/07/18 14:40:31 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> res4: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1186374c
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411319#comment-15411319
 ] 

Sun Rui edited comment on SPARK-16944 at 8/8/16 5:29 AM:
-

I don't think it can be improved without dynamic allocation. because without 
dynamic allocation, you can't know the nodes where the data reside in advance. 
But it can be compensated by allocating at least one executor on each salve, 
but may be some waste of executors.


was (Author: sunrui):
I don't think it can be improved without dynamic allocation. because without 
dynamic allocation, you can't know the nodes where the data reside in advance. 
But it can be compensated by allocating at least one executor on salves, but 
may be some waste of executors.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411320#comment-15411320
 ] 

Sun Rui commented on SPARK-16944:
-

I don't think it can be improved without dynamic allocation. because without 
dynamic allocation, you can't know the nodes where the data reside in advance. 
But it can be compensated by allocating at least one executor on salves, but 
may be some waste of executors.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411319#comment-15411319
 ] 

Sun Rui commented on SPARK-16944:
-

I don't think it can be improved without dynamic allocation. because without 
dynamic allocation, you can't know the nodes where the data reside in advance. 
But it can be compensated by allocating at least one executor on salves, but 
may be some waste of executors.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-16944:

Comment: was deleted

(was: I don't think it can be improved without dynamic allocation. because 
without dynamic allocation, you can't know the nodes where the data reside in 
advance. But it can be compensated by allocating at least one executor on 
salves, but may be some waste of executors.)

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411315#comment-15411315
 ] 

Michael Gummelt commented on SPARK-16944:
-

Yea, we typically call it "delay scheduling".  It was first written about by 
the Spark/Mesos researchers:  
http://elmeleegy.com/khaled/papers/delay_scheduling.pdf

Spark already has `spark.locality.wait`, but that's how long the task scheduler 
will wait until an executor will come up with the preferred locality.  We need 
a similar concept for waiting for offers to come in so we can place the 
executor correctly in the first place.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411313#comment-15411313
 ] 

Sun Rui commented on SPARK-16944:
-

Not quite sure. I think the Mesos scheduler backend can get the preferred 
locations of pending task and wait to check if there are offers coming from the 
preferred slaves. Of course, there should be a timeout mechanism that data 
locality can be relaxed if no offers are from the preferred slaves in a 
reasonable time period.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16919) Configurable update interval for console progress bar

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16919.
---
   Resolution: Fixed
 Assignee: Tejas Patil
Fix Version/s: 2.1.0

Resolved by https://github.com/apache/spark/pull/14507

> Configurable update interval for console progress bar
> -
>
> Key: SPARK-16919
> URL: https://issues.apache.org/jira/browse/SPARK-16919
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Trivial
> Fix For: 2.1.0
>
>
> Currently the update interval for the console progress bar is hardcoded. This 
> can be made configurable for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411309#comment-15411309
 ] 

Michael Gummelt commented on SPARK-16944:
-

Since Mesos is offer based, it's up to the Spark scheduler itself to choose 
which offers have the best locality.  In YARN, I think they tell the resource 
manager about preferences.


> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411307#comment-15411307
 ] 

Michael Gummelt commented on SPARK-16944:
-

I think we can improve both with and without dynamic allocation.  In both 
modes, Mesos is only looking at locality after it's already placed the 
executors. 

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411305#comment-15411305
 ] 

Sean Owen commented on SPARK-16945:
---

Probably does not need a JIRA. We have to do this periodically and the change 
itself is the description, essentially.

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16945:
--
Assignee: Weiqing Yang
Priority: Trivial  (was: Minor)

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Assignee: Weiqing Yang
>Priority: Trivial
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16945:


Assignee: (was: Apache Spark)

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411303#comment-15411303
 ] 

Apache Spark commented on SPARK-16945:
--

User 'Sherry302' has created a pull request for this issue:
https://github.com/apache/spark/pull/14532

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16945:


Assignee: Apache Spark

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Assignee: Apache Spark
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411282#comment-15411282
 ] 

Hyukjin Kwon edited comment on SPARK-16918 at 8/8/16 5:14 AM:
--

FYI, it seems fine in current master fortunately. 

{code}
>>> N = 101
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
df.select(udf_columns).take(1)
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0')]
{code}


was (Author: hyukjin.kwon):
FYI, it seems fine in current master fortunately. 

{code}
>>> N = 100
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0')]
{code}

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> 

[jira] [Commented] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411300#comment-15411300
 ] 

Hyukjin Kwon commented on SPARK-16918:
--

Oh, thanks for pointing this out. I just did this with 101. Let me edit my 
comment above (from N=100 to N=101).

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411299#comment-15411299
 ] 

Saisai Shao commented on SPARK-16944:
-

Does Mesos have the similar concept like Yarn container, also can it be 
allocated in a locality preferred way?

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Weiqing Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiqing Yang updated SPARK-16945:
-
Component/s: Build

> Fix Java Lint errors
> 
>
> Key: SPARK-16945
> URL: https://issues.apache.org/jira/browse/SPARK-16945
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Weiqing Yang
>Priority: Minor
>
> There are following errors when running dev/lint-java:
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
>  (modifier) RedundantModifier: Redundant 'final' modifier.
> [ERROR] 
> src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224]
>  (sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16945) Fix Java Lint errors

2016-08-07 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-16945:


 Summary: Fix Java Lint errors
 Key: SPARK-16945
 URL: https://issues.apache.org/jira/browse/SPARK-16945
 Project: Spark
  Issue Type: Task
Reporter: Weiqing Yang
Priority: Minor


There are following errors when running dev/lint-java:
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[113,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java:[126,11]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[36,11]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[46,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[74,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[93,13]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java:[106,10]
 (modifier) RedundantModifier: Redundant 'final' modifier.
[ERROR] 
src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[224] 
(sizes) LineLength: Line is longer than 100 characters (found 104).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411297#comment-15411297
 ] 

Sun Rui commented on SPARK-16944:
-

[~jerryshao] [~mgummelt]

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-16944:

Comment: was deleted

(was: [~jerryshao] [~mgummelt])

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411295#comment-15411295
 ] 

Sun Rui commented on SPARK-16944:
-

[~jerryshao] [~mgummelt]

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411292#comment-15411292
 ] 

Dongjoon Hyun commented on SPARK-16918:
---

By the way, you need to test with N=101 . :)

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Sun Rui (JIRA)
Sun Rui created SPARK-16944:
---

 Summary: [MESOS] Improve data locality when launching new 
executors when dynamic allocation is enabled
 Key: SPARK-16944
 URL: https://issues.apache.org/jira/browse/SPARK-16944
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Sun Rui


Currently Spark on Yarn supports better data locality by considering the 
preferred locations of the pending tasks when dynamic allocation is enabled. 
Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
that Mesos can also support this feature.

I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16918.
---
Resolution: Not A Problem

OK, it's probably a duplicate of something else then but I don't know what.

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411289#comment-15411289
 ] 

Dongjoon Hyun commented on SPARK-16918:
---

Oh, fast [~hyukjin.kwon]! :)

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411287#comment-15411287
 ] 

Dongjoon Hyun commented on SPARK-16918:
---

Hi, [~phihungle].
I met the same error in bot 1.6.1 and 1.6.2 with your sample code. And, the 
problem seems to be resolved in Apache Spark 2.0 release.
{code}
>>> N = 101
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', ...
{code}

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411287#comment-15411287
 ] 

Dongjoon Hyun edited comment on SPARK-16918 at 8/8/16 5:03 AM:
---

Hi, [~phihungle].
I met the same error in both 1.6.1 and 1.6.2 with your sample code. And, the 
problem seems to be resolved in Apache Spark 2.0 release.
{code}
>>> N = 101
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', ...
{code}


was (Author: dongjoon):
Hi, [~phihungle].
I met the same error in bot 1.6.1 and 1.6.2 with your sample code. And, the 
problem seems to be resolved in Apache Spark 2.0 release.
{code}
>>> N = 101
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', ...
{code}

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16918) Weird error when selecting more than 100 spark udf columns

2016-08-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411282#comment-15411282
 ] 

Hyukjin Kwon commented on SPARK-16918:
--

FYI, it seems fine in current master fortunately. 

{code}
>>> N = 100
>>> df = sqlContext.createDataFrame([{'value': 0}])
>>> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
>>> range(N)]
>>> df.select(udf_columns).take(1)
[Row((value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0', (value)=u'0', (value)=u'0', 
(value)=u'0')]
{code}

> Weird error when selecting more than 100 spark udf columns
> --
>
> Key: SPARK-16918
> URL: https://issues.apache.org/jira/browse/SPARK-16918
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.1
>Reporter: Phi Hung LE
>
> Starting with a simple spark dataframe with only one value, I create N simple 
> udf columns.
> {code}
> N = 100
> df = sqlContext.createDataFrame([{'value': 0}])
> udf_columns = [pyspark.sql.functions.udf(lambda x: 0)('value') for _ in 
> range(N)]
> df.select(udf_columns).take(1)
> {code}
> For N <= 100 this code works perfectly. But as soon as N >= 101, I found the 
> following error
> {code}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 34.0 (TID 50, localhost): java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: PythonUDF#(input[0, LongType])
> at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:239)
> at org.apache.spark.sql.execution.PythonUDF.genCode(python.scala:44)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411277#comment-15411277
 ] 

Michael Gummelt commented on SPARK-11638:
-

This JIRA is complex and a lot of it is out of date.  Can someone briefly 
explain to me what the problem is?  Why do you want bridge networking?



> Run Spark on Mesos with bridge networking
> -
>
> Key: SPARK-11638
> URL: https://issues.apache.org/jira/browse/SPARK-11638
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Radoslaw Gruchalski
> Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark Driver is based on {{akka-remote}}. In order to 
> take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and 
> {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile 
> with Akka 2.4.x yet.
> What we want is the back port of mentioned {{akka-remote}} settings to 
> {{2.3.x}} versions. These patches are attached to this ticket - 
> {{2.3.4.patch}} and 

[jira] [Assigned] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16942:


Assignee: (was: Apache Spark)

> CREATE TABLE LIKE generates External table when source table is an External 
> Hive Serde table
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411264#comment-15411264
 ] 

Apache Spark commented on SPARK-16942:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14531

> CREATE TABLE LIKE generates External table when source table is an External 
> Hive Serde table
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16942:


Assignee: Apache Spark

> CREATE TABLE LIKE generates External table when source table is an External 
> Hive Serde table
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16943:


Assignee: Apache Spark

> CREATE TABLE LIKE generates a non-empty table when source is a data source 
> table
> 
>
> Key: SPARK-16943
> URL: https://issues.apache.org/jira/browse/SPARK-16943
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> When the source table is a data source table, the table generated by CREATE 
> TABLE LIKE is non-empty. The expected table should be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411258#comment-15411258
 ] 

Apache Spark commented on SPARK-16943:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14531

> CREATE TABLE LIKE generates a non-empty table when source is a data source 
> table
> 
>
> Key: SPARK-16943
> URL: https://issues.apache.org/jira/browse/SPARK-16943
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the source table is a data source table, the table generated by CREATE 
> TABLE LIKE is non-empty. The expected table should be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16943:


Assignee: (was: Apache Spark)

> CREATE TABLE LIKE generates a non-empty table when source is a data source 
> table
> 
>
> Key: SPARK-16943
> URL: https://issues.apache.org/jira/browse/SPARK-16943
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the source table is a data source table, the table generated by CREATE 
> TABLE LIKE is non-empty. The expected table should be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is External

2016-08-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16942:

Summary: CREATE TABLE LIKE generates External table when source table is 
External  (was: CREATE TABLE LIKE generates External table when Source table is 
External)

> CREATE TABLE LIKE generates External table when source table is External
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table

2016-08-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16942:

Summary: CREATE TABLE LIKE generates External table when source table is an 
External Hive Serde table  (was: CREATE TABLE LIKE generates External table 
when source table is External)

> CREATE TABLE LIKE generates External table when source table is an External 
> Hive Serde table
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table

2016-08-07 Thread Xiao Li (JIRA)
Xiao Li created SPARK-16943:
---

 Summary: CREATE TABLE LIKE generates a non-empty table when source 
is a data source table
 Key: SPARK-16943
 URL: https://issues.apache.org/jira/browse/SPARK-16943
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


When the source table is a data source table, the table generated by CREATE 
TABLE LIKE is non-empty. The expected table should be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16942) CREATE TABLE LIKE generates External table when Source table is External

2016-08-07 Thread Xiao Li (JIRA)
Xiao Li created SPARK-16942:
---

 Summary: CREATE TABLE LIKE generates External table when Source 
table is External
 Key: SPARK-16942
 URL: https://issues.apache.org/jira/browse/SPARK-16942
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
TABLE LIKE}} will generate an EXTERNAL table. The expected table type should be 
MANAGED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16941) SparkSQLOperationManager shuold use synchronization map to store SessionHandle

2016-08-07 Thread carlmartin (JIRA)
carlmartin created SPARK-16941:
--

 Summary: SparkSQLOperationManager shuold use synchronization map 
to store SessionHandle
 Key: SPARK-16941
 URL: https://issues.apache.org/jira/browse/SPARK-16941
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0, 1.5.1
Reporter: carlmartin
Priority: Minor


When I had run a high concurrency sql query with thriefserver, I found this 
error without any previous error:
*Error: java.util.NoSuchElementException: key not found: SessionHandle 
[a2ea264f-d29d-43c4-842f-4e0f2a3cf877] (state=,code=0)*

So I check the code in *SparkSQLOperationManager*, and found the 
*Map(sessionToContexts and sessionToActivePool)* is not thread safe in it.
I found this error in version 1.5.1 but I think the last master branch will 
still have this error.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16666) Kryo encoder for custom complex classes

2016-08-07 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411207#comment-15411207
 ] 

Wenchen Fan commented on SPARK-1:
-

[~samehraban] ,  in your stack trace,  `Queryable.scala` doesn't exist anymore, 
can you try it with latest code?

> Kryo encoder for custom complex classes
> ---
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Question
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Sam
>
> I'm trying to create a dataset with some geo data using spark and esri. If 
> `Foo` only have `Point` field, it'll work but if I add some other fields 
> beyond a `Point`, I get ArrayIndexOutOfBoundsException.
> {code:scala}
> import com.esri.core.geometry.Point
> import org.apache.spark.sql.{Encoder, Encoders, SQLContext}
> import org.apache.spark.{SparkConf, SparkContext}
> 
> object Main {
> 
>   case class Foo(position: Point, name: String)
> 
>   object MyEncoders {
> implicit def PointEncoder: Encoder[Point] = Encoders.kryo[Point]
> 
> implicit def FooEncoder: Encoder[Foo] = Encoders.kryo[Foo]
>   }
> 
>   def main(args: Array[String]): Unit = {
> val sc = new SparkContext(new 
> SparkConf().setAppName("app").setMaster("local"))
> val sqlContext = new SQLContext(sc)
> import MyEncoders.{FooEncoder, PointEncoder}
> import sqlContext.implicits._
> Seq(new Foo(new Point(0, 0), "bar")).toDS.show
>   }
> }
> {code}
> {noformat}
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.Queryable$$anonfun$formatString$1$$anonfun$apply$2.apply(Queryable.scala:71)
> at 
> org.apache.spark.sql.execution.Queryable$$anonfun$formatString$1$$anonfun$apply$2.apply(Queryable.scala:70)
>  
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>  
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>  
> at 
> org.apache.spark.sql.execution.Queryable$$anonfun$formatString$1.apply(Queryable.scala:70)
>  
> at 
> org.apache.spark.sql.execution.Queryable$$anonfun$formatString$1.apply(Queryable.scala:69)
>  
> at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:73) 
> at 
> org.apache.spark.sql.execution.Queryable$class.formatString(Queryable.scala:69)
>  
> at org.apache.spark.sql.Dataset.formatString(Dataset.scala:65) 
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:263) 
> at org.apache.spark.sql.Dataset.show(Dataset.scala:230) 
> at org.apache.spark.sql.Dataset.show(Dataset.scala:193) 
> at org.apache.spark.sql.Dataset.show(Dataset.scala:201) 
> at Main$.main(Main.scala:24) 
> at Main.main(Main.scala)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15899) file scheme should be used correctly

2016-08-07 Thread xuyifei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411190#comment-15411190
 ] 

xuyifei commented on SPARK-15899:
-

really appreciate, it works,( sorry reply too late

> file scheme should be used correctly
> 
>
> Key: SPARK-15899
> URL: https://issues.apache.org/jira/browse/SPARK-15899
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Kazuaki Ishizaki
>Assignee: Alexander Ulanov
>
> [A RFC|https://www.ietf.org/rfc/rfc1738.txt] defines file scheme as 
> {{file://host/}} or {{file:///}}. 
> [Wikipedia|https://en.wikipedia.org/wiki/File_URI_scheme]
> [Some code 
> stuffs|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L58]
>  use different prefix such as {{file:}}.
> It would be good to prepare a utility method to correctly add {{file://host}} 
> or {{file://} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16868) Executor will be both dead and alive when this executor reregister itself to driver.

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411180#comment-15411180
 ] 

Apache Spark commented on SPARK-16868:
--

User 'SaintBacchus' has created a pull request for this issue:
https://github.com/apache/spark/pull/14530

> Executor will be both dead and alive when this executor reregister itself to 
> driver.
> 
>
> Key: SPARK-16868
> URL: https://issues.apache.org/jira/browse/SPARK-16868
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: carlmartin
>Priority: Minor
> Attachments: 2016-8-3 15-41-47.jpg, 2016-8-3 15-51-13.jpg
>
>
> In a rare condition, Executor will register its block manager twice.
> !https://issues.apache.org/jira/secure/attachment/12821794/2016-8-3%2015-41-47.jpg!
> When unregister it from BlockManagerMaster, driver mark it as "DEAD" in 
> executors WebUI.
> But when the heartbeat reregister the block manager again, this executor will 
> also have another status "Active".
> !https://issues.apache.org/jira/secure/attachment/12821795/2016-8-3%2015-51-13.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16868) Executor will be both dead and alive when this executor reregister itself to driver.

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16868:


Assignee: (was: Apache Spark)

> Executor will be both dead and alive when this executor reregister itself to 
> driver.
> 
>
> Key: SPARK-16868
> URL: https://issues.apache.org/jira/browse/SPARK-16868
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: carlmartin
>Priority: Minor
> Attachments: 2016-8-3 15-41-47.jpg, 2016-8-3 15-51-13.jpg
>
>
> In a rare condition, Executor will register its block manager twice.
> !https://issues.apache.org/jira/secure/attachment/12821794/2016-8-3%2015-41-47.jpg!
> When unregister it from BlockManagerMaster, driver mark it as "DEAD" in 
> executors WebUI.
> But when the heartbeat reregister the block manager again, this executor will 
> also have another status "Active".
> !https://issues.apache.org/jira/secure/attachment/12821795/2016-8-3%2015-51-13.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16868) Executor will be both dead and alive when this executor reregister itself to driver.

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16868:


Assignee: Apache Spark

> Executor will be both dead and alive when this executor reregister itself to 
> driver.
> 
>
> Key: SPARK-16868
> URL: https://issues.apache.org/jira/browse/SPARK-16868
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: carlmartin
>Assignee: Apache Spark
>Priority: Minor
> Attachments: 2016-8-3 15-41-47.jpg, 2016-8-3 15-51-13.jpg
>
>
> In a rare condition, Executor will register its block manager twice.
> !https://issues.apache.org/jira/secure/attachment/12821794/2016-8-3%2015-41-47.jpg!
> When unregister it from BlockManagerMaster, driver mark it as "DEAD" in 
> executors WebUI.
> But when the heartbeat reregister the block manager again, this executor will 
> also have another status "Active".
> !https://issues.apache.org/jira/secure/attachment/12821795/2016-8-3%2015-51-13.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16914) NodeManager crash when spark are registering executor infomartion into leveldb

2016-08-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411159#comment-15411159
 ] 

Saisai Shao edited comment on SPARK-16914 at 8/8/16 1:48 AM:
-

So from your description, is this exception mainly due to the problem of disk1 
that leveldb fail to write data into it?

Maybe this JIRA SPARK-14963 could address your problem, it uses NM's recovery 
dir to store aux-service data. And I guess NM will handle this disk failure 
problem if you configure multiple disks for NM local dir.


was (Author: jerryshao):
So from your description, is this exception mainly due to the problem of disk1 
that leveldb fail to write data into it?

Maybe this JIRA SPARK-16917 could address your problem, it uses NM's recovery 
dir to store aux-service data. And I guess NM will handle this disk failure 
problem if you configure multiple disks for NM local dir.

> NodeManager crash when spark are registering executor infomartion into leveldb
> --
>
> Key: SPARK-16914
> URL: https://issues.apache.org/jira/browse/SPARK-16914
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.6.2
>Reporter: cen yuhai
>
> {noformat}
> Stack: [0x7fb5b53de000,0x7fb5b54df000],  sp=0x7fb5b54dcba8,  free 
> space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [libc.so.6+0x896b1]  memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  
> org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j  
> org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j  
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 
> org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V
>  (100 bytes) @ 0x7fb5f27ff6cc [0x7fb5f27fdde0+0x18ec]
> J 8371 C2 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (10 bytes) @ 0x7fb5f242df20 [0x7fb5f242de80+0xa0]
> J 6853 C2 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (74 bytes) @ 0x7fb5f215587c [0x7fb5f21557e0+0x9c]
> J 5872 C2 
> io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (42 bytes) @ 0x7fb5f2183268 [0x7fb5f2183100+0x168]
> J 5849 C2 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (158 bytes) @ 0x7fb5f2191524 [0x7fb5f218f5a0+0x1f84]
> J 5941 C2 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (170 bytes) @ 0x7fb5f220a230 [0x7fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V 
> (363 bytes) @ 0x7fb5f264465c [0x7fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 
> 0x7fb5f26f6764 [0x7fb5f26f63c0+0x3a4]
> j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {noformat}
> The target code in spark is in ExternalShuffleBlockResolver
> {code}
>   /** Registers a new Executor with all the configuration we need to find its 
> shuffle files. */
>   public void registerExecutor(
>   String appId,
>   String execId,
>   ExecutorShuffleInfo executorInfo) {
> AppExecId fullId = new AppExecId(appId, execId);
> logger.info("Registered executor {} with {}", fullId, executorInfo);
> try {
>   if (db != null) {
> byte[] key = dbAppExecKey(fullId);
> byte[] value =  
> 

[jira] [Commented] (SPARK-16914) NodeManager crash when spark are registering executor infomartion into leveldb

2016-08-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411159#comment-15411159
 ] 

Saisai Shao commented on SPARK-16914:
-

So from your description, is this exception mainly due to the problem of disk1 
that leveldb fail to write data into it?

Maybe this JIRA SPARK-16917 could address your problem, it uses NM's recovery 
dir to store aux-service data. And I guess NM will handle this disk failure 
problem if you configure multiple disks for NM local dir.

> NodeManager crash when spark are registering executor infomartion into leveldb
> --
>
> Key: SPARK-16914
> URL: https://issues.apache.org/jira/browse/SPARK-16914
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.6.2
>Reporter: cen yuhai
>
> {noformat}
> Stack: [0x7fb5b53de000,0x7fb5b54df000],  sp=0x7fb5b54dcba8,  free 
> space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [libc.so.6+0x896b1]  memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  
> org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j  
> org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j  
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 
> org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V
>  (100 bytes) @ 0x7fb5f27ff6cc [0x7fb5f27fdde0+0x18ec]
> J 8371 C2 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (10 bytes) @ 0x7fb5f242df20 [0x7fb5f242de80+0xa0]
> J 6853 C2 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (74 bytes) @ 0x7fb5f215587c [0x7fb5f21557e0+0x9c]
> J 5872 C2 
> io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (42 bytes) @ 0x7fb5f2183268 [0x7fb5f2183100+0x168]
> J 5849 C2 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (158 bytes) @ 0x7fb5f2191524 [0x7fb5f218f5a0+0x1f84]
> J 5941 C2 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (170 bytes) @ 0x7fb5f220a230 [0x7fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V 
> (363 bytes) @ 0x7fb5f264465c [0x7fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 
> 0x7fb5f26f6764 [0x7fb5f26f63c0+0x3a4]
> j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {noformat}
> The target code in spark is in ExternalShuffleBlockResolver
> {code}
>   /** Registers a new Executor with all the configuration we need to find its 
> shuffle files. */
>   public void registerExecutor(
>   String appId,
>   String execId,
>   ExecutorShuffleInfo executorInfo) {
> AppExecId fullId = new AppExecId(appId, execId);
> logger.info("Registered executor {} with {}", fullId, executorInfo);
> try {
>   if (db != null) {
> byte[] key = dbAppExecKey(fullId);
> byte[] value =  
> mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
> db.put(key, value);
>   }
> } catch (Exception e) {
>   logger.error("Error saving registered executors", e);
> }
> executors.put(fullId, executorInfo);
>   }
> {code}
> There is a problem with disk1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Commented] (SPARK-16940) `checkAnswer` should raise `TestFailedException` for wrong results

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411145#comment-15411145
 ] 

Apache Spark commented on SPARK-16940:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14528

> `checkAnswer` should raise `TestFailedException` for wrong results
> --
>
> Key: SPARK-16940
> URL: https://issues.apache.org/jira/browse/SPARK-16940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue fixes the following to make `checkAnswer` raise 
> `TestFailedException` again instead of `java.util.NoSuchElementException: key 
> not found: TZ` in the environments without `TZ` variable. Also, this issue 
> adds `QueryTestSuite` class for testing `QueryTest` itself.
> {code}
> - |Timezone Env: ${sys.env("TZ")}
> + |Timezone Env: ${sys.env.getOrElse("TZ", "")}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16940) `checkAnswer` should raise `TestFailedException` for wrong results

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16940:


Assignee: (was: Apache Spark)

> `checkAnswer` should raise `TestFailedException` for wrong results
> --
>
> Key: SPARK-16940
> URL: https://issues.apache.org/jira/browse/SPARK-16940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue fixes the following to make `checkAnswer` raise 
> `TestFailedException` again instead of `java.util.NoSuchElementException: key 
> not found: TZ` in the environments without `TZ` variable. Also, this issue 
> adds `QueryTestSuite` class for testing `QueryTest` itself.
> {code}
> - |Timezone Env: ${sys.env("TZ")}
> + |Timezone Env: ${sys.env.getOrElse("TZ", "")}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16940) `checkAnswer` should raise `TestFailedException` for wrong results

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16940:


Assignee: Apache Spark

> `checkAnswer` should raise `TestFailedException` for wrong results
> --
>
> Key: SPARK-16940
> URL: https://issues.apache.org/jira/browse/SPARK-16940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> This issue fixes the following to make `checkAnswer` raise 
> `TestFailedException` again instead of `java.util.NoSuchElementException: key 
> not found: TZ` in the environments without `TZ` variable. Also, this issue 
> adds `QueryTestSuite` class for testing `QueryTest` itself.
> {code}
> - |Timezone Env: ${sys.env("TZ")}
> + |Timezone Env: ${sys.env.getOrElse("TZ", "")}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16940) `checkAnswer` should raise `TestFailedException` for wrong results

2016-08-07 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-16940:
-

 Summary: `checkAnswer` should raise `TestFailedException` for 
wrong results
 Key: SPARK-16940
 URL: https://issues.apache.org/jira/browse/SPARK-16940
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Dongjoon Hyun
Priority: Minor


This issue fixes the following to make `checkAnswer` raise 
`TestFailedException` again instead of `java.util.NoSuchElementException: key 
not found: TZ` in the environments without `TZ` variable. Also, this issue adds 
`QueryTestSuite` class for testing `QueryTest` itself.

{code}
- |Timezone Env: ${sys.env("TZ")}
+ |Timezone Env: ${sys.env.getOrElse("TZ", "")}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411137#comment-15411137
 ] 

Xiao Li commented on SPARK-16937:
-

Sure, will do it. Thanks!

> Confusing behaviors when View and Temp View sharing the same names
> --
>
> Key: SPARK-16937
> URL: https://issues.apache.org/jira/browse/SPARK-16937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
> ID < 3")
> sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
> // This returns the contents of the temp view.
> sql(s"select * from $viewName").show(false)
> // This returns the contents of the view.
> sql(s"select * from default.$viewName").show(false)
> // Below is to drop the temp view
> sql(s"DROP VIEW $viewName")
> // Both results are non-temp view
> sql(s"select * from $viewName").show(false)
> sql(s"select * from default.$viewName").show(false)
> // After another drop, the non-temp view is dropped.
> sql(s"DROP VIEW $viewName")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411134#comment-15411134
 ] 

Wenchen Fan commented on SPARK-16937:
-

For the problem described in the JIRA, yea, seems our rule is not so friendly, 
but we need to be very careful when changing rules.
For the ALTER VIEW, SGTM, can you open a new JIRA?

> Confusing behaviors when View and Temp View sharing the same names
> --
>
> Key: SPARK-16937
> URL: https://issues.apache.org/jira/browse/SPARK-16937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
> ID < 3")
> sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
> // This returns the contents of the temp view.
> sql(s"select * from $viewName").show(false)
> // This returns the contents of the view.
> sql(s"select * from default.$viewName").show(false)
> // Below is to drop the temp view
> sql(s"DROP VIEW $viewName")
> // Both results are non-temp view
> sql(s"select * from $viewName").show(false)
> sql(s"select * from default.$viewName").show(false)
> // After another drop, the non-temp view is dropped.
> sql(s"DROP VIEW $viewName")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-08-07 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411132#comment-15411132
 ] 

Junyang Qian commented on SPARK-16508:
--

Sounds good. I'll be working on the undocumented/duplicated argument warnings. 

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16938:


Assignee: Apache Spark

> Cannot resolve column name after a join
> ---
>
> Key: SPARK-16938
> URL: https://issues.apache.org/jira/browse/SPARK-16938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Mathieu D
>Assignee: Apache Spark
>Priority: Minor
>
> Found a change of behavior on spark-2.0.0, which breaks a query in our code 
> base.
> The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
> {code}
> val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
> val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
> dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", 
> "dfb.id"))
> {code}
> but fails with spark-2.0.0 with the exception : 
> {code}
> Cannot resolve column name "dfa.id" among (id, a, id, b); 
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
> among (id, a, id, b);
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16938:


Assignee: (was: Apache Spark)

> Cannot resolve column name after a join
> ---
>
> Key: SPARK-16938
> URL: https://issues.apache.org/jira/browse/SPARK-16938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Mathieu D
>Priority: Minor
>
> Found a change of behavior on spark-2.0.0, which breaks a query in our code 
> base.
> The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
> {code}
> val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
> val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
> dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", 
> "dfb.id"))
> {code}
> but fails with spark-2.0.0 with the exception : 
> {code}
> Cannot resolve column name "dfa.id" among (id, a, id, b); 
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
> among (id, a, id, b);
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1545#comment-1545
 ] 

Apache Spark commented on SPARK-16938:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14527

> Cannot resolve column name after a join
> ---
>
> Key: SPARK-16938
> URL: https://issues.apache.org/jira/browse/SPARK-16938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Mathieu D
>Priority: Minor
>
> Found a change of behavior on spark-2.0.0, which breaks a query in our code 
> base.
> The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
> {code}
> val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
> val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
> dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", 
> "dfb.id"))
> {code}
> but fails with spark-2.0.0 with the exception : 
> {code}
> Cannot resolve column name "dfa.id" among (id, a, id, b); 
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
> among (id, a, id, b);
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12920) Fix high CPU usage in spark thrift server with concurrent users

2016-08-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-12920:
-
Summary: Fix high CPU usage in spark thrift server with concurrent users  
(was: Spark thrift server can run at very high CPU with concurrent users)

> Fix high CPU usage in spark thrift server with concurrent users
> ---
>
> Key: SPARK-12920
> URL: https://issues.apache.org/jira/browse/SPARK-12920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
> Attachments: SPARK-12920.profiler.png, 
> SPARK-12920.profiler_job_progress_listner.png
>
>
> - Configured with fair-share-scheduler.
> - 4-5 users submitting/running jobs concurrently via spark-thrift-server
> - Spark thrift server spikes to1600+% CPU and stays there for long time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411106#comment-15411106
 ] 

Dongjoon Hyun commented on SPARK-16938:
---

Hi, [~mathieude].

This bug seems to be introduced at `SPARK-15230 distinct() does not handle 
column name with dot`

https://github.com/apache/spark/commit/925884a612dd88beaddf555c74d90856ab040ec7

I'll make a PR soon.

> Cannot resolve column name after a join
> ---
>
> Key: SPARK-16938
> URL: https://issues.apache.org/jira/browse/SPARK-16938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Mathieu D
>Priority: Minor
>
> Found a change of behavior on spark-2.0.0, which breaks a query in our code 
> base.
> The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
> {code}
> val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
> val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
> dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", 
> "dfb.id"))
> {code}
> but fails with spark-2.0.0 with the exception : 
> {code}
> Cannot resolve column name "dfa.id" among (id, a, id, b); 
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
> among (id, a, id, b);
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
>   at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15899) file scheme should be used correctly

2016-08-07 Thread Bruno C Faria (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411052#comment-15411052
 ] 

Bruno C Faria commented on SPARK-15899:
---

I have used System.setProperty("spark.sql.warehouse.dir","file:///C:/temp") 
within my project (with Maven) using Scala IDE  in Windows and it's also worked.

> file scheme should be used correctly
> 
>
> Key: SPARK-15899
> URL: https://issues.apache.org/jira/browse/SPARK-15899
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Kazuaki Ishizaki
>Assignee: Alexander Ulanov
>
> [A RFC|https://www.ietf.org/rfc/rfc1738.txt] defines file scheme as 
> {{file://host/}} or {{file:///}}. 
> [Wikipedia|https://en.wikipedia.org/wiki/File_URI_scheme]
> [Some code 
> stuffs|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L58]
>  use different prefix such as {{file:}}.
> It would be good to prepare a utility method to correctly add {{file://host}} 
> or {{file://} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16939) Fix build error by using `Tuple1` explicitly in StringFunctionSuite

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16939.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 1.6.3

Resolved by https://github.com/apache/spark/pull/14526

> Fix build error by using `Tuple1` explicitly in StringFunctionSuite
> ---
>
> Key: SPARK-16939
> URL: https://issues.apache.org/jira/browse/SPARK-16939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 1.6.3
>
>
> This issue aims to fix a build error on branch 1.6, but we had better have 
> this in master branch, too. There are other ongoing PR, too.
> {code}
> [error] 
> /home/jenkins/workspace/spark-branch-1.6-compile-maven-with-yarn-2.3/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala:82:
>  value toDF is not a member of Seq[String]
> [error] val df = Seq("c").toDF("s")
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6932) A Prototype of Parameter Server

2016-08-07 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411023#comment-15411023
 ] 

Debasish Das commented on SPARK-6932:
-

[~rxin] [~sowen] Do we have any other active parameter server effort going on 
other than glint project from Rolf ? I have started to look into glint to scale 
Spark-as-a-Service to process queries (idea is that can we keep Spark master as 
a coordinator but 0 compute happens on Spark master other than coordination 
through messages, in our impl right now compute is happening on master which is 
a major con right now). More details will be covered in the talk 
https://spark-summit.org/eu-2016/events/fusing-apache-spark-and-lucene-for-near-realtime-predictive-model-building/
 but I believe parameter server (or something similar) will be needed to scale 
query-processing further to Cassandra ring architecture for example...We will 
provide our implementation for spark-lucene integration as part of our 
framework (Trapezium) open source.


> A Prototype of Parameter Server
> ---
>
> Key: SPARK-6932
> URL: https://issues.apache.org/jira/browse/SPARK-6932
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib, Spark Core
>Reporter: Qiping Li
>
>  h2. Introduction
> As specified in 
> [SPARK-4590|https://issues.apache.org/jira/browse/SPARK-4590],it would be 
> very helpful to integrate parameter server into Spark for machine learning 
> algorithms, especially for those with ultra high dimensions features. 
> After carefully studying the design doc of [Parameter 
> Servers|https://docs.google.com/document/d/1SX3nkmF41wFXAAIr9BgqvrHSS5mW362fJ7roBXJm06o/edit?usp=sharing],and
>  the paper of [Factorbird|http://stanford.edu/~rezab/papers/factorbird.pdf], 
> we proposed a prototype of Parameter Server on Spark(Ps-on-Spark), with 
> several key design concerns:
> * *User friendly interface*
>   Careful investigation is done to most existing Parameter Server 
> systems(including:  [petuum|http://petuum.github.io], [parameter 
> server|http://parameterserver.org], 
> [paracel|https://github.com/douban/paracel]) and a user friendly interface is 
> design by absorbing essence from all these system. 
> * *Prototype of distributed array*
> IndexRDD (see 
> [SPARK-4590|https://issues.apache.org/jira/browse/SPARK-4590]) doesn't seem 
> to be a good option for distributed array, because in most case, the #key 
> updates/second is not be very high. 
> So we implement a distributed HashMap to store the parameters, which can 
> be easily extended to get better performance.
> 
> * *Minimal code change*
>   Quite a lot of effort in done to avoid code change of Spark core. Tasks 
> which need parameter server are still created and scheduled by Spark's 
> scheduler. Tasks communicate with parameter server with a client object, 
> through *akka* or *netty*.
> With all these concerns we propose the following architecture:
> h2. Architecture
> !https://cloud.githubusercontent.com/assets/1285855/7158179/f2d25cc4-e3a9-11e4-835e-89681596c478.jpg!
> Data is stored in RDD and is partitioned across workers. During each 
> iteration, each worker gets parameters from parameter server then computes 
> new parameters based on old parameters and data in the partition. Finally 
> each worker updates parameters to parameter server.Worker communicates with 
> parameter server through a parameter server client,which is initialized in 
> `TaskContext` of this worker.
> The current implementation is based on YARN cluster mode, 
> but it should not be a problem to transplanted it to other modes. 
> h3. Interface
> We refer to existing parameter server systems(petuum, parameter server, 
> paracel) when design the interface of parameter server. 
> *`PSClient` provides the following interface for workers to use:*
> {code}
> //  get parameter indexed by key from parameter server
> def get[T](key: String): T
> // get multiple parameters from parameter server
> def multiGet[T](keys: Array[String]): Array[T]
> // add parameter indexed by `key` by `delta`, 
> // if multiple `delta` to update on the same parameter,
> // use `reduceFunc` to reduce these `delta`s frist.
> def update[T](key: String, delta: T, reduceFunc: (T, T) => T): Unit
> // update multiple parameters at the same time, use the same `reduceFunc`.
> def multiUpdate(keys: Array[String], delta: Array[T], reduceFunc: (T, T) => 
> T: Unit
> 
> // advance clock to indicate that current iteration is finished.
> def clock(): Unit
>  
> // block until all workers have reached this line of code.
> def sync(): Unit
> {code}
> *`PSContext` provides following functions to use on driver:*
> {code}
> // load parameters from existing rdd.
> def loadPSModel[T](model: RDD[String, T]) 
> // fetch parameters from parameter server to construct 

[jira] [Commented] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410991#comment-15410991
 ] 

Xiao Li commented on SPARK-16937:
-

We are using different name resolution rules, should we make them consistent?

> Confusing behaviors when View and Temp View sharing the same names
> --
>
> Key: SPARK-16937
> URL: https://issues.apache.org/jira/browse/SPARK-16937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
> ID < 3")
> sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
> // This returns the contents of the temp view.
> sql(s"select * from $viewName").show(false)
> // This returns the contents of the view.
> sql(s"select * from default.$viewName").show(false)
> // Below is to drop the temp view
> sql(s"DROP VIEW $viewName")
> // Both results are non-temp view
> sql(s"select * from $viewName").show(false)
> sql(s"select * from default.$viewName").show(false)
> // After another drop, the non-temp view is dropped.
> sql(s"DROP VIEW $viewName")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410989#comment-15410989
 ] 

Xiao Li commented on SPARK-16937:
-

Currently, the existing DDL behaviors for views when users do not specify the 
database name are described below:

{{CREATE OR REPLACE TEMPORARY VIEW view_name}} creates/alters the {{TEMPORARY}} 
view.

{{CREATE OR REPLACE VIEW view_name}} creates/alters the {{PERSISTENT}} view.

{{DROP VIEW view_name}} OR {{SELECT... FROM view_name}} is always first applied 
to a {{TEMPORARY}} view, if existing. If the temporary view does not exist, we 
try to drop/fetch the PERSISTENT view, if existing.

{{ALTER VIEW view_name}} is only applicable to the {{PERSISTENT}} view, even if 
the temporary view with the same name exists.

> Confusing behaviors when View and Temp View sharing the same names
> --
>
> Key: SPARK-16937
> URL: https://issues.apache.org/jira/browse/SPARK-16937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
> ID < 3")
> sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
> // This returns the contents of the temp view.
> sql(s"select * from $viewName").show(false)
> // This returns the contents of the view.
> sql(s"select * from default.$viewName").show(false)
> // Below is to drop the temp view
> sql(s"DROP VIEW $viewName")
> // Both results are non-temp view
> sql(s"select * from $viewName").show(false)
> sql(s"select * from default.$viewName").show(false)
> // After another drop, the non-temp view is dropped.
> sql(s"DROP VIEW $viewName")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16939) Fix build error by using `Tuple1` explicitly in StringFunctionSuite

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16939:


Assignee: Apache Spark

> Fix build error by using `Tuple1` explicitly in StringFunctionSuite
> ---
>
> Key: SPARK-16939
> URL: https://issues.apache.org/jira/browse/SPARK-16939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> This issue aims to fix a build error on branch 1.6, but we had better have 
> this in master branch, too. There are other ongoing PR, too.
> {code}
> [error] 
> /home/jenkins/workspace/spark-branch-1.6-compile-maven-with-yarn-2.3/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala:82:
>  value toDF is not a member of Seq[String]
> [error] val df = Seq("c").toDF("s")
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16939) Fix build error by using `Tuple1` explicitly in StringFunctionSuite

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16939:


Assignee: (was: Apache Spark)

> Fix build error by using `Tuple1` explicitly in StringFunctionSuite
> ---
>
> Key: SPARK-16939
> URL: https://issues.apache.org/jira/browse/SPARK-16939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to fix a build error on branch 1.6, but we had better have 
> this in master branch, too. There are other ongoing PR, too.
> {code}
> [error] 
> /home/jenkins/workspace/spark-branch-1.6-compile-maven-with-yarn-2.3/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala:82:
>  value toDF is not a member of Seq[String]
> [error] val df = Seq("c").toDF("s")
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16939) Fix build error by using `Tuple1` explicitly in StringFunctionSuite

2016-08-07 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-16939:
-

 Summary: Fix build error by using `Tuple1` explicitly in 
StringFunctionSuite
 Key: SPARK-16939
 URL: https://issues.apache.org/jira/browse/SPARK-16939
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Dongjoon Hyun
Priority: Minor


This issue aims to fix a build error on branch 1.6, but we had better have this 
in master branch, too. There are other ongoing PR, too.

{code}
[error] 
/home/jenkins/workspace/spark-branch-1.6-compile-maven-with-yarn-2.3/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala:82:
 value toDF is not a member of Seq[String]
[error] val df = Seq("c").toDF("s")
[error]   ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12436) If all values of a JSON field is null, JSON's inferSchema should return NullType instead of StringType

2016-08-07 Thread Hasil Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410946#comment-15410946
 ] 

Hasil Sharma commented on SPARK-12436:
--

Is this issue solved ? If not, would like to contribute

> If all values of a JSON field is null, JSON's inferSchema should return 
> NullType instead of StringType
> --
>
> Key: SPARK-12436
> URL: https://issues.apache.org/jira/browse/SPARK-12436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: starter
>
> Right now, JSON's inferSchema will return {{StringType}} for a field that 
> always has null values or an {{ArrayType(StringType)}}  for a field that 
> always has empty array values. Although this behavior makes writing JSON data 
> to other data sources easy (i.e. when writing data, we do not need to remove 
> those {{NullType}} or {{ArrayType(NullType)}} columns), it makes downstream 
> application hard to reason about the actual schema of the data and thus makes 
> schema merging hard. We should allow JSON's inferSchema returns {{NullType}} 
> and {{ArrayType(NullType)}}. Also, we need to make sure that when we write 
> data out, we should remove those {{NullType}} or {{ArrayType(NullType)}} 
> columns first. 
> Besides  {{NullType}} and {{ArrayType(NullType)}}, we may need to do the same 
> thing for empty {{StructType}}s (i.e. a {{StructType}} having 0 fields). 
> To finish this work, we need to finish the following sub-tasks:
> * Allow JSON's inferSchema returns {{NullType}} and {{ArrayType(NullType)}}.
> * Determine whether we need to add the operation of removing {{NullType}} and 
> {{ArrayType(NullType)}} columns from the data that will be write out for all 
> data sources (i.e. data sources based our data source API and Hive tables). 
> Or, we should just add this operation for certain data sources (e.g. 
> Parquet). For example, we may not need this operation for Hive because Hive 
> has VoidObjectInspector.
> * Implement the change and get it merged to Spark master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Mathieu D (JIRA)
Mathieu D created SPARK-16938:
-

 Summary: Cannot resolve column name after a join
 Key: SPARK-16938
 URL: https://issues.apache.org/jira/browse/SPARK-16938
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Mathieu D
Priority: Minor


Found a change of behavior on spark-2.0.0, which breaks a query in our code 
base.

The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
{code}
val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", "dfb.id"))
{code}

but fails with spark-2.0.0 with the exception : 
{code}
Cannot resolve column name "dfa.id" among (id, a, id, b); 
org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
among (id, a, id, b);
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
...
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16324) regexp_extract should doc that it returns empty string when match fails

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410927#comment-15410927
 ] 

Apache Spark commented on SPARK-16324:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/14525

> regexp_extract should doc that it returns empty string when match fails
> ---
>
> Key: SPARK-16324
> URL: https://issues.apache.org/jira/browse/SPARK-16324
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> The documentation for regexp_extract isn't clear about how it should behave 
> if the regex didn't match the row. However, the Java documentation it refers 
> for further detail suggests that the return value should be null if the group 
> wasn't matched at all, empty string is the group actually matched empty 
> string, and an exception raised if the entire regex didn't match.
> This would be identical to how python's own re module behaves when a 
> MatchObject.group() is called.
> However, in practice regexp_extract() returns empty string when the match 
> fails. This seems to be a bug; if it was intended as a feature, it should 
> have been documented as such - and it was probably not a good idea since it 
> can result in silent bugs.
> {code}
> import pyspark.sql.functions as F
> df = spark.createDataFrame([['abc']], ['text'])
> assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == ''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16324) regexp_extract should doc that it returns empty string when match fails

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16324:


Assignee: Apache Spark

> regexp_extract should doc that it returns empty string when match fails
> ---
>
> Key: SPARK-16324
> URL: https://issues.apache.org/jira/browse/SPARK-16324
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Assignee: Apache Spark
>Priority: Minor
>
> The documentation for regexp_extract isn't clear about how it should behave 
> if the regex didn't match the row. However, the Java documentation it refers 
> for further detail suggests that the return value should be null if the group 
> wasn't matched at all, empty string is the group actually matched empty 
> string, and an exception raised if the entire regex didn't match.
> This would be identical to how python's own re module behaves when a 
> MatchObject.group() is called.
> However, in practice regexp_extract() returns empty string when the match 
> fails. This seems to be a bug; if it was intended as a feature, it should 
> have been documented as such - and it was probably not a good idea since it 
> can result in silent bugs.
> {code}
> import pyspark.sql.functions as F
> df = spark.createDataFrame([['abc']], ['text'])
> assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == ''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16324) regexp_extract should doc that it returns empty string when match fails

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16324:


Assignee: (was: Apache Spark)

> regexp_extract should doc that it returns empty string when match fails
> ---
>
> Key: SPARK-16324
> URL: https://issues.apache.org/jira/browse/SPARK-16324
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> The documentation for regexp_extract isn't clear about how it should behave 
> if the regex didn't match the row. However, the Java documentation it refers 
> for further detail suggests that the return value should be null if the group 
> wasn't matched at all, empty string is the group actually matched empty 
> string, and an exception raised if the entire regex didn't match.
> This would be identical to how python's own re module behaves when a 
> MatchObject.group() is called.
> However, in practice regexp_extract() returns empty string when the match 
> fails. This seems to be a bug; if it was intended as a feature, it should 
> have been documented as such - and it was probably not a good idea since it 
> can result in silent bugs.
> {code}
> import pyspark.sql.functions as F
> df = spark.createDataFrame([['abc']], ['text'])
> assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == ''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-08-07 Thread xubo245 (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410923#comment-15410923
 ] 

xubo245 commented on SPARK-15575:
-

If remove breeze dependency,we need rewrite similar project?
I think we can build mllib linalg project ,and which dependencies breeze or 
other library. Also we can update the project if breeze can not support 
scala2.12.


> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16409) regexp_extract with optional groups causes NPE

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-16409:
-

Assignee: Sean Owen

> regexp_extract with optional groups causes NPE
> --
>
> Key: SPARK-16409
> URL: https://issues.apache.org/jira/browse/SPARK-16409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Assignee: Sean Owen
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> df = sqlContext.createDataFrame([['c']], ['s'])
> df.select(F.regexp_extract('s', r'(a+)(b)?(c)', 2)).collect()
> causes NPE. Worse, in a large program it doesn't cause NPE instantly; it 
> actually works fine, until some unpredictable (and inconsistent) moment in 
> the future when (presumably) the invalid memory access occurs, and then it 
> fails. For this reason, it took several hours to debug this.
> Suggestion: either fill the group with null; or raise exception immediately 
> after examining the argument with a message that optional groups are not 
> allowed.
> Traceback:
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df.select(F.regexp_extract('s', r'(a+)(b)?(c)', 2)).collect()
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\pyspark\sql\dataframe.py
>  in collect(self)
> 294 """
> 295 with SCCallSiteSync(self._sc) as css:
> --> 296 port = self._jdf.collectToPython()
> 297 return list(_load_from_socket(port, 
> BatchedSerializer(PickleSerializer(
> 298 
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934 
> 935 for temp_arg in temp_args:
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\pyspark\sql\utils.py
>  in deco(*a, **kw)
>  55 def deco(*a, **kw):
>  56 try:
> ---> 57 return f(*a, **kw)
>  58 except py4j.protocol.Py4JJavaError as e:
>  59 s = e.java_exception.toString()
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o51.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:883)
>   at 
> 

[jira] [Resolved] (SPARK-16409) regexp_extract with optional groups causes NPE

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16409.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1
   1.6.3

Issue resolved by pull request 14504
[https://github.com/apache/spark/pull/14504]

> regexp_extract with optional groups causes NPE
> --
>
> Key: SPARK-16409
> URL: https://issues.apache.org/jira/browse/SPARK-16409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Max Moroz
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> df = sqlContext.createDataFrame([['c']], ['s'])
> df.select(F.regexp_extract('s', r'(a+)(b)?(c)', 2)).collect()
> causes NPE. Worse, in a large program it doesn't cause NPE instantly; it 
> actually works fine, until some unpredictable (and inconsistent) moment in 
> the future when (presumably) the invalid memory access occurs, and then it 
> fails. For this reason, it took several hours to debug this.
> Suggestion: either fill the group with null; or raise exception immediately 
> after examining the argument with a message that optional groups are not 
> allowed.
> Traceback:
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df.select(F.regexp_extract('s', r'(a+)(b)?(c)', 2)).collect()
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\pyspark\sql\dataframe.py
>  in collect(self)
> 294 """
> 295 with SCCallSiteSync(self._sc) as css:
> --> 296 port = self._jdf.collectToPython()
> 297 return list(_load_from_socket(port, 
> BatchedSerializer(PickleSerializer(
> 298 
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934 
> 935 for temp_arg in temp_args:
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\pyspark\sql\utils.py
>  in deco(*a, **kw)
>  55 def deco(*a, **kw):
>  56 try:
> ---> 57 return f(*a, **kw)
>  58 except py4j.protocol.Py4JJavaError as e:
>  59 s = e.java_exception.toString()
> C:\Users\me\Downloads\spark-2.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o51.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> 

[jira] [Updated] (SPARK-16909) Streaming for postgreSQL JDBC driver

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16909:
--
  Assignee: prince john wesley
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

> Streaming for postgreSQL JDBC driver
> 
>
> Key: SPARK-16909
> URL: https://issues.apache.org/jira/browse/SPARK-16909
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: prince john wesley
>Assignee: prince john wesley
>Priority: Minor
> Fix For: 2.1.0
>
>
> postgreSQL JDBC driver sets 0 as default record fetch size, which means, it 
> caches all rows irrespective of the row count. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16909) Streaming for postgreSQL JDBC driver

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16909.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14502
[https://github.com/apache/spark/pull/14502]

> Streaming for postgreSQL JDBC driver
> 
>
> Key: SPARK-16909
> URL: https://issues.apache.org/jira/browse/SPARK-16909
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: prince john wesley
> Fix For: 2.1.0
>
>
> postgreSQL JDBC driver sets 0 as default record fetch size, which means, it 
> caches all rows irrespective of the row count. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16832:


Assignee: (was: Apache Spark)

> CrossValidator and TrainValidationSplit are not random without seed
> ---
>
> Key: SPARK-16832
> URL: https://issues.apache.org/jira/browse/SPARK-16832
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> Repeatedly running CrossValidator or TrainValidationSplit without an explicit 
> seed parameter does not change results. It is supposed to be seeded with a 
> random seed, but it seems to be instead seeded with some constant. (If seed 
> is explicitly provided, the two classes behave as expected.)
> {code}
> dataset = spark.createDataFrame(
>   [(Vectors.dense([0.0]), 0.0),
>(Vectors.dense([0.4]), 1.0),
>(Vectors.dense([0.5]), 0.0),
>(Vectors.dense([0.6]), 1.0),
>(Vectors.dense([1.0]), 1.0)] * 1000,
>   ["features", "label"]).cache()
> paramGrid = pyspark.ml.tuning.ParamGridBuilder().build()
> tvs = 
> pyspark.ml.tuning.TrainValidationSplit(estimator=pyspark.ml.regression.LinearRegression(),
>  
>estimatorParamMaps=paramGrid,
>
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>trainRatio=0.8)
> model = tvs.fit(train)
> print(model.validationMetrics)
> for folds in (3, 5, 10):
>   cv = 
> pyspark.ml.tuning.CrossValidator(estimator=pyspark.ml.regression.LinearRegression(),
>  
>   estimatorParamMaps=paramGrid, 
>   
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>   numFolds=folds
>  )
>   cvModel = cv.fit(dataset)
>   print(folds, cvModel.avgMetrics)
> {code}
> This code produces identical results upon repeated calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410913#comment-15410913
 ] 

Apache Spark commented on SPARK-16832:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/14524

> CrossValidator and TrainValidationSplit are not random without seed
> ---
>
> Key: SPARK-16832
> URL: https://issues.apache.org/jira/browse/SPARK-16832
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> Repeatedly running CrossValidator or TrainValidationSplit without an explicit 
> seed parameter does not change results. It is supposed to be seeded with a 
> random seed, but it seems to be instead seeded with some constant. (If seed 
> is explicitly provided, the two classes behave as expected.)
> {code}
> dataset = spark.createDataFrame(
>   [(Vectors.dense([0.0]), 0.0),
>(Vectors.dense([0.4]), 1.0),
>(Vectors.dense([0.5]), 0.0),
>(Vectors.dense([0.6]), 1.0),
>(Vectors.dense([1.0]), 1.0)] * 1000,
>   ["features", "label"]).cache()
> paramGrid = pyspark.ml.tuning.ParamGridBuilder().build()
> tvs = 
> pyspark.ml.tuning.TrainValidationSplit(estimator=pyspark.ml.regression.LinearRegression(),
>  
>estimatorParamMaps=paramGrid,
>
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>trainRatio=0.8)
> model = tvs.fit(train)
> print(model.validationMetrics)
> for folds in (3, 5, 10):
>   cv = 
> pyspark.ml.tuning.CrossValidator(estimator=pyspark.ml.regression.LinearRegression(),
>  
>   estimatorParamMaps=paramGrid, 
>   
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>   numFolds=folds
>  )
>   cvModel = cv.fit(dataset)
>   print(folds, cvModel.avgMetrics)
> {code}
> This code produces identical results upon repeated calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-07 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16832:


Assignee: Apache Spark

> CrossValidator and TrainValidationSplit are not random without seed
> ---
>
> Key: SPARK-16832
> URL: https://issues.apache.org/jira/browse/SPARK-16832
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Assignee: Apache Spark
>Priority: Minor
>
> Repeatedly running CrossValidator or TrainValidationSplit without an explicit 
> seed parameter does not change results. It is supposed to be seeded with a 
> random seed, but it seems to be instead seeded with some constant. (If seed 
> is explicitly provided, the two classes behave as expected.)
> {code}
> dataset = spark.createDataFrame(
>   [(Vectors.dense([0.0]), 0.0),
>(Vectors.dense([0.4]), 1.0),
>(Vectors.dense([0.5]), 0.0),
>(Vectors.dense([0.6]), 1.0),
>(Vectors.dense([1.0]), 1.0)] * 1000,
>   ["features", "label"]).cache()
> paramGrid = pyspark.ml.tuning.ParamGridBuilder().build()
> tvs = 
> pyspark.ml.tuning.TrainValidationSplit(estimator=pyspark.ml.regression.LinearRegression(),
>  
>estimatorParamMaps=paramGrid,
>
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>trainRatio=0.8)
> model = tvs.fit(train)
> print(model.validationMetrics)
> for folds in (3, 5, 10):
>   cv = 
> pyspark.ml.tuning.CrossValidator(estimator=pyspark.ml.regression.LinearRegression(),
>  
>   estimatorParamMaps=paramGrid, 
>   
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>   numFolds=folds
>  )
>   cvModel = cv.fit(dataset)
>   print(folds, cvModel.avgMetrics)
> {code}
> This code produces identical results upon repeated calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10063) Remove DirectParquetOutputCommitter

2016-08-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410870#comment-15410870
 ] 

Steve Loughran commented on SPARK-10063:


The solution for this is going to be s3guard, HADOOP-13345, which adds the 
dynamo-metadata storing for atomic/consistent operations, plus, as an added 
bonus, the ability to skip s3 HTTP calls in getFileStatus(). That'll be the 
foundation for an output committer to handle speculative commits and bypass the 
rename.

That work is just starting up —I would strongly encourage you to get involved, 
making sure your needs are represented, and helping test it.

Until then: 
# switch your code to using s3a and Hadoop 2.7.2+ ; it's better all round, gets 
better in Hadoop 2.8, and is the basis for s3guard.
# use the Hadoop {{FileOutputCommitter}} and set 
{{mapreduce.fileoutputcommitter.algorithm.version}} to 2.



> Remove DirectParquetOutputCommitter
> ---
>
> Key: SPARK-10063
> URL: https://issues.apache.org/jira/browse/SPARK-10063
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Reynold Xin
>Priority: Critical
> Fix For: 2.0.0
>
>
> When we use DirectParquetOutputCommitter on S3 and speculation is enabled, 
> there is a chance that we can loss data. 
> Here is the code to reproduce the problem.
> {code}
> import org.apache.spark.sql.functions._
> val failSpeculativeTask = sqlContext.udf.register("failSpeculativeTask", (i: 
> Int, partitionId: Int, attemptNumber: Int) => {
>   if (partitionId == 0 && i == 5) {
> if (attemptNumber > 0) {
>   Thread.sleep(15000)
>   throw new Exception("new exception")
> } else {
>   Thread.sleep(1)
> }
>   }
>   
>   i
> })
> val df = sc.parallelize((1 to 100), 20).mapPartitions { iter =>
>   val context = org.apache.spark.TaskContext.get()
>   val partitionId = context.partitionId
>   val attemptNumber = context.attemptNumber
>   iter.map(i => (i, partitionId, attemptNumber))
> }.toDF("i", "partitionId", "attemptNumber")
> df
>   .select(failSpeculativeTask($"i", $"partitionId", 
> $"attemptNumber").as("i"), $"partitionId", $"attemptNumber")
>   .write.mode("overwrite").format("parquet").save("/home/yin/outputCommitter")
> sqlContext.read.load("/home/yin/outputCommitter").count
> // The result is 99 and 5 is missing from the output.
> {code}
> What happened is that the original task finishes first and uploads its output 
> file to S3, then the speculative task somehow fails. Because we have to call 
> output stream's close method, which uploads data to S3, we actually uploads 
> the partial result generated by the failed speculative task to S3 and this 
> file overwrites the correct file generated by the original task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16769) httpclient classic dependency - potentially a patch required?

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16769.
---
Resolution: Not A Problem

I think this doesn't end up being a vulnerability AFAICT and would partly go 
away anyway once early Hadoop versions are dropped

> httpclient classic dependency - potentially a patch required?
> -
>
> Key: SPARK-16769
> URL: https://issues.apache.org/jira/browse/SPARK-16769
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
> Environment: All Spark versions, any environment
>Reporter: Adam Roberts
>Priority: Minor
>
> In our jars folder for Spark we provide a jar with a CVE 
> https://www.versioneye.com/java/commons-httpclient:commons-httpclient/3.1. 
> CVE-2012-5783
> This paper outlines the problem
> www.cs.utexas.edu/~shmat/shmat_ccs12.pdf
> My question is: do we need to ship this version as well or is it only used 
> for tests? Is it a patched version? I plan to run without this dependency and 
> if there are NoClassDefFound problems I'll add test so we 
> don't ship it (downloading it in the first place is bad enough though)
> Note that this is valid for all versions, suggesting it be raised to a 
> critical if Spark functionality is depending on it because of what the pdf 
> I've linked to mentions
> Here is the jar being included:
> ls $SPARK_HOME/jars | grep "httpclient"
> commons-httpclient-3.1.jar
> httpclient-4.5.2.jar
> The first jar potentially contains the security issue, could be a patched 
> version, need to verify. SHA1 sum for this jar is 
> 964cd74171f427720480efdec40a7c7f6e58426a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16326) Evaluate sparklyr package from RStudio

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16326.
---
Resolution: Not A Problem

> Evaluate sparklyr package from RStudio
> --
>
> Key: SPARK-16326
> URL: https://issues.apache.org/jira/browse/SPARK-16326
> Project: Spark
>  Issue Type: Brainstorming
>  Components: SparkR
>Reporter: Sun Rui
>
> Rstudio has developed sparklyr (https://github.com/rstudio/sparklyr) 
> connecting R community to Spark. A rough review shows that sparklyr provides 
> a dplyr backend and new API for mLLIB and for calling Spark from R. Of 
> course, sparklyr internally uses the low level mechanism in SparkR.
> We can discuss how to position SparkR with sparklyr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16911) Remove migrating to a Spark 1.x version in programming guide documentation

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16911:
--
Assignee: Shivansh

> Remove migrating to a Spark 1.x version in programming guide documentation
> --
>
> Key: SPARK-16911
> URL: https://issues.apache.org/jira/browse/SPARK-16911
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Shivansh
>Assignee: Shivansh
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16911) Remove migrating to a Spark 1.x version in programming guide documentation

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16911.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14503
[https://github.com/apache/spark/pull/14503]

> Remove migrating to a Spark 1.x version in programming guide documentation
> --
>
> Key: SPARK-16911
> URL: https://issues.apache.org/jira/browse/SPARK-16911
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Shivansh
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16870) add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16870:
--
Assignee: Liang Ke

> add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help 
> people to how to fix this timeout error when it happenned
> -
>
> Key: SPARK-16870
> URL: https://issues.apache.org/jira/browse/SPARK-16870
> Project: Spark
>  Issue Type: Improvement
>Reporter: Liang Ke
>Assignee: Liang Ke
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> here my workload and what I found 
> I run a large number jobs with spark-sql at the same time. and meet the error 
> that print timeout (some job contains the broadcast-join operator) : 
> 16/08/03 15:43:23 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING,
> java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:107)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin.doExecute(BroadcastHashOuterJoin.scala:113)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.Filter.doExecute(basicOperators.scala:70)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:201)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecute
> StatementOperation$$execute(SparkExecuteStatementOperation.scala:211)
> at 
> 

[jira] [Resolved] (SPARK-16870) add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16870.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14477
[https://github.com/apache/spark/pull/14477]

> add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help 
> people to how to fix this timeout error when it happenned
> -
>
> Key: SPARK-16870
> URL: https://issues.apache.org/jira/browse/SPARK-16870
> Project: Spark
>  Issue Type: Improvement
>Reporter: Liang Ke
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> here my workload and what I found 
> I run a large number jobs with spark-sql at the same time. and meet the error 
> that print timeout (some job contains the broadcast-join operator) : 
> 16/08/03 15:43:23 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING,
> java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:107)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin.doExecute(BroadcastHashOuterJoin.scala:113)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.Filter.doExecute(basicOperators.scala:70)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:201)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
> at 
> 

[jira] [Commented] (SPARK-16864) Comprehensive version info

2016-08-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410856#comment-15410856
 ] 

Sean Owen commented on SPARK-16864:
---

Yeah you've hit it on the head: git hash is really only relevant for source 
code and some kind of build workflow. Within released software at runtime, the 
only meaningful version indicator is the version x.y.z or maybe a snapshot 
timestamp, because they're ordered and you might reason about them in code. The 
only thing you can do with a hash is log it. You already have the git hash in 
your workflow and can use it, because you are specifically interested in 
building from hashes and noting it. You haven't provided a use case for writing 
an app that depends on a hash - it works that way, not "give me any reason not 
to have this".

> Comprehensive version info 
> ---
>
> Key: SPARK-16864
> URL: https://issues.apache.org/jira/browse/SPARK-16864
> Project: Spark
>  Issue Type: Improvement
>Reporter: jay vyas
>
> Spark versions can be grepped out of the Spark banner that comes up on 
> startup, but otherwise, there is no programmatic/reliable way to get version 
> information.
> Also there is no git commit id, etc.  So precise version checking isnt 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16932) Programming-guide Accumulator section should be more clear w.r.t new API

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16932:
--
Assignee: Bryan Cutler

> Programming-guide Accumulator section should be more clear w.r.t new API
> 
>
> Key: SPARK-16932
> URL: https://issues.apache.org/jira/browse/SPARK-16932
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> The programming-guide section on Accumulators starts off describing the old 
> API, which is deprecated now, and then shows examples with the new API and 
> ends with another code snippet of the old API.  For Scala, at least, there 
> should only be mention of the new API to be clear to the user what is 
> recommended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16932) Programming-guide Accumulator section should be more clear w.r.t new API

2016-08-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16932.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14516
[https://github.com/apache/spark/pull/14516]

> Programming-guide Accumulator section should be more clear w.r.t new API
> 
>
> Key: SPARK-16932
> URL: https://issues.apache.org/jira/browse/SPARK-16932
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bryan Cutler
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> The programming-guide section on Accumulators starts off describing the old 
> API, which is deprecated now, and then shows examples with the new API and 
> ends with another code snippet of the old API.  For Scala, at least, there 
> should only be mention of the new API to be clear to the user what is 
> recommended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410848#comment-15410848
 ] 

Xiao Li commented on SPARK-16937:
-

[~rxin] [~yhuai] [~cloud_fan] Do you think the above behaviors are expected? 
IMO, the second statement should stop with an error. Is it a bug? Thanks!

> Confusing behaviors when View and Temp View sharing the same names
> --
>
> Key: SPARK-16937
> URL: https://issues.apache.org/jira/browse/SPARK-16937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
> ID < 3")
> sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
> // This returns the contents of the temp view.
> sql(s"select * from $viewName").show(false)
> // This returns the contents of the view.
> sql(s"select * from default.$viewName").show(false)
> // Below is to drop the temp view
> sql(s"DROP VIEW $viewName")
> // Both results are non-temp view
> sql(s"select * from $viewName").show(false)
> sql(s"select * from default.$viewName").show(false)
> // After another drop, the non-temp view is dropped.
> sql(s"DROP VIEW $viewName")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16937) Confusing behaviors when View and Temp View sharing the same names

2016-08-07 Thread Xiao Li (JIRA)
Xiao Li created SPARK-16937:
---

 Summary: Confusing behaviors when View and Temp View sharing the 
same names
 Key: SPARK-16937
 URL: https://issues.apache.org/jira/browse/SPARK-16937
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


{noformat}
sql(s"CREATE TEMPORARY VIEW $viewName AS SELECT * FROM $tabName WHERE 
ID < 3")
sql(s"CREATE VIEW $viewName AS SELECT * FROM $tabName")
// This returns the contents of the temp view.
sql(s"select * from $viewName").show(false)
// This returns the contents of the view.
sql(s"select * from default.$viewName").show(false)

// Below is to drop the temp view
sql(s"DROP VIEW $viewName")
// Both results are non-temp view
sql(s"select * from $viewName").show(false)
sql(s"select * from default.$viewName").show(false)

// After another drop, the non-temp view is dropped.
sql(s"DROP VIEW $viewName")
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16913) [SQL] Better codegen where querying nested struct

2016-08-07 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410840#comment-15410840
 ] 

Kazuaki Ishizaki commented on SPARK-16913:
--

It seems to copy each elements in a struct. Since {{InternalRow}} does not 
include a structure, the {{internalRow}} keeps two scalar values, which 
consists of {{isNull}} and {{value}} in this case. If we can provide better 
schema property (i.e. {{nullable = false}} for {{a}} and {{b}}), lines 44-62 
would be simpler.

> [SQL] Better codegen where querying nested struct
> -
>
> Key: SPARK-16913
> URL: https://issues.apache.org/jira/browse/SPARK-16913
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>
> I have parquet file created as result of:
> {code}
> spark.range(100).selectExpr("id as a", "id as b").selectExpr("struct(a, b) as 
> c").write.parquet("/mnt/mfs/codegen_test")
> {code}
> Then I'm querying whole nested structure with:
> {code}
> spark.read.parquet("/mnt/mfs/codegen_test").selectExpr("c.*")
> {code}
> As a result of spark whole stage codegen I'm getting following code.
> Is it possible to remove part from line 044 and just return whole result of 
> getStruct ? (maybe just copied)
> {code}
> Generated code:
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIterator(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIterator extends 
> org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> scan_numOutputRows;
> /* 008 */   private scala.collection.Iterator scan_input;
> /* 009 */   private UnsafeRow scan_result;
> /* 010 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder scan_holder;
> /* 011 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> scan_rowWriter;
> /* 012 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> scan_rowWriter1;
> /* 013 */   private UnsafeRow project_result;
> /* 014 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
> /* 015 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> project_rowWriter;
> /* 016 */
> /* 017 */   public GeneratedIterator(Object[] references) {
> /* 018 */ this.references = references;
> /* 019 */   }
> /* 020 */
> /* 021 */   public void init(int index, scala.collection.Iterator inputs[]) {
> /* 022 */ partitionIndex = index;
> /* 023 */ this.scan_numOutputRows = 
> (org.apache.spark.sql.execution.metric.SQLMetric) references[0];
> /* 024 */ scan_input = inputs[0];
> /* 025 */ scan_result = new UnsafeRow(1);
> /* 026 */ this.scan_holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(scan_result, 
> 32);
> /* 027 */ this.scan_rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_holder,
>  1);
> /* 028 */ this.scan_rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_holder,
>  2);
> /* 029 */ project_result = new UnsafeRow(2);
> /* 030 */ this.project_holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result,
>  0);
> /* 031 */ this.project_rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
>  2);
> /* 032 */   }
> /* 033 */
> /* 034 */   protected void processNext() throws java.io.IOException {
> /* 035 */ while (scan_input.hasNext()) {
> /* 036 */   InternalRow scan_row = (InternalRow) scan_input.next();
> /* 037 */   scan_numOutputRows.add(1);
> /* 038 */   boolean scan_isNull = scan_row.isNullAt(0);
> /* 039 */   InternalRow scan_value = scan_isNull ? null : 
> (scan_row.getStruct(0, 2));
> /* 040 */
> /* 041 */   boolean project_isNull = scan_isNull;
> /* 042 */   long project_value = -1L;
> /* 043 */
> /* 044 */   if (!scan_isNull) {
> /* 045 */ if (scan_value.isNullAt(0)) {
> /* 046 */   project_isNull = true;
> /* 047 */ } else {
> /* 048 */   project_value = scan_value.getLong(0);
> /* 049 */ }
> /* 050 */
> /* 051 */   }
> /* 052 */   boolean project_isNull2 = scan_isNull;
> /* 053 */   long project_value2 = -1L;
> /* 054 */
> /* 055 */   if (!scan_isNull) {
> /* 056 */ if (scan_value.isNullAt(1)) {
> /* 057 */   project_isNull2 = true;
> /* 058 */ } else {
> /* 059 */   project_value2 = scan_value.getLong(1);
> /* 060 */ }
> /* 061 */
> /* 062 */   }
> /* 063 */   project_rowWriter.zeroOutNullBytes();
> /* 

[jira] [Commented] (SPARK-8904) When using LDA DAGScheduler throws exception

2016-08-07 Thread Nabarun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410835#comment-15410835
 ] 

Nabarun commented on SPARK-8904:


This seem to be related to something which I am seeing at my end to. I 
converted my countVectors into DF

val ldaDF = countVectors.map { case Row(id: Long, countVector: Vector) => (id, 
countVector) } 

When I am trying to display it, this throws following exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 3148.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3148.0 
(TID 11632, 10.209.235.85): scala.MatchError: 
[0,(1671,[1,2,3,5,8,10,11,12,14,15,17,18,20,21,23,27,28,29,30,31,32,36,37,38,39,41,42,43,45,46,51,52,54,66,69,71,74,75,78,80,82,83,85,88,89,90,92,96,97,98,99,102,104,106,107,108,109,111,112,115,118,121,123,124,126,134,138,139,143,144,145,148,150,151,152,153,155,161,166,171,172,173,174,176,178,179,180,181,189,190,197,199,200,201,207,209,212,216,217,218,220,222,223,224,226,227,228,232,234,238,240,244,246,250,252,254,255,260,261,262,264,268,269,270,277,280,281,282,286,292,294,295,296,297,301,310,312,314,316,318,323,324,325,337,341,343,346,347,351,355,359,366,367,379,380,381,388,390,391,398,403,405,411,417,442,444,448,456,460,464,466,468,470,477,480,484,487,490,491,495,496,501,502,507,509,512,522,523,527,529,531,533,534,535,552,554,556,557,565,566,567,569,574,575,585,624,630,632,633,638,644,646,652,653,658,668,669,670,680,683,686,690,693,696,698,704,705,712,723,726,736,746,747,750,757,758,761,765,773,774,775,783,786,796,797,801,807,811,815,825,830,833,843,844,845,847,849,859,861,862,864,867,871,872,876,879,882,892,895,896,897,912,923,924,935,937,941,944,945,948,949,952,968,982,989,1000,1003,1015,1018,1021,1025,1029,1034,1036,1038,1041,1048,1072,1082,1086,1092,1106,,1114,1117,1123,1128,1133,1135,1145,1149,1154,1168,1169,1171,1178,1180,1181,1183,1184,1201,1224,1234,1240,1250,1260,1261,1267,1269,1270,1280,1305,1309,1317,1333,1354,1355,1358,1378,1379,1386,1389,1393,1411,1413,1426,1428,1475,1480,1504,1506,1521,1525,1530,1532,1545,1555,1601,1614,1635,1643,1649,1653,1668],[1.0,5.0,4.0,3.0,2.0,14.0,30.0,2.0,72.0,9.0,6.0,6.0,1.0,13.0,1.0,4.0,1.0,3.0,2.0,10.0,2.0,4.0,74.0,3.0,11.0,1.0,35.0,1.0,16.0,1.0,2.0,15.0,3.0,4.0,17.0,2.0,8.0,60.0,35.0,3.0,1.0,33.0,2.0,2.0,3.0,11.0,16.0,2.0,8.0,2.0,3.0,48.0,1.0,1.0,4.0,8.0,4.0,3.0,4.0,4.0,1.0,3.0,1.0,11.0,1.0,2.0,3.0,1.0,35.0,6.0,2.0,1.0,2.0,3.0,3.0,4.0,2.0,2.0,1.0,1.0,20.0,9.0,6.0,17.0,10.0,8.0,1.0,12.0,1.0,3.0,3.0,2.0,9.0,1.0,2.0,19.0,1.0,2.0,1.0,1.0,2.0,9.0,1.0,1.0,1.0,5.0,1.0,2.0,5.0,1.0,1.0,1.0,1.0,1.0,7.0,1.0,14.0,2.0,2.0,1.0,5.0,2.0,5.0,5.0,20.0,2.0,27.0,3.0,4.0,11.0,1.0,3.0,3.0,1.0,2.0,2.0,7.0,5.0,2.0,2.0,1.0,3.0,1.0,2.0,1.0,2.0,8.0,5.0,1.0,5.0,3.0,1.0,4.0,3.0,3.0,4.0,1.0,3.0,4.0,1.0,2.0,3.0,5.0,7.0,1.0,8.0,1.0,2.0,4.0,2.0,1.0,12.0,5.0,1.0,6.0,4.0,2.0,2.0,1.0,1.0,3.0,4.0,1.0,1.0,2.0,4.0,3.0,1.0,2.0,6.0,1.0,1.0,1.0,4.0,2.0,1.0,7.0,12.0,1.0,12.0,1.0,1.0,9.0,2.0,1.0,2.0,1.0,1.0,6.0,6.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,2.0,3.0,1.0,1.0,2.0,1.0,3.0,1.0,4.0,1.0,5.0,2.0,1.0,2.0,2.0,3.0,1.0,2.0,1.0,1.0,2.0,3.0,1.0,4.0,3.0,1.0,3.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0,2.0,2.0,1.0,1.0,2.0,3.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,1.0,5.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])]
 (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
at 
line1907dd16af5d4fbfa217a9d52f096b36316.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:142)
at 
line1907dd16af5d4fbfa217a9d52f096b36316.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:142)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:790)
at