[jira] [Commented] (SPARK-16685) audit release docs are ambiguous

2016-07-24 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391168#comment-15391168
 ] 

Patrick Wendell commented on SPARK-16685:
-

These scripts are pretty old and I'm not sure if anyone still uses them. I had 
written them a while back as sanity tests for some release builds. Today, those 
things are tested broadly by the community so I think this has become 
redundant. [~rxin] are these still used? If not, it might be good to remove 
them from the source repo.

> audit release docs are ambiguous
> 
>
> Key: SPARK-16685
> URL: https://issues.apache.org/jira/browse/SPARK-16685
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.6.2
>Reporter: jay vyas
>Priority: Minor
>
> The dev/audit-release tooling is ambiguous.
> - should it run against a real cluster? if so when?
> - what should be in the release repo?  Just jars? tarballs?  ( i assume jars 
> because its a .ivy, but not sure).
> - 
> https://github.com/apache/spark/tree/master/dev/audit-release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download

2016-03-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-13855.
-
   Resolution: Fixed
Fix Version/s: 1.6.1

> Spark 1.6.1 artifacts not found in S3 bucket / direct download
> --
>
> Key: SPARK-13855
> URL: https://issues.apache.org/jira/browse/SPARK-13855
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.6.1
> Environment: production
>Reporter: Sandesh Deshmane
>Assignee: Patrick Wendell
> Fix For: 1.6.1
>
>
> Getting below error while deploying spark on EC2 with version 1.6.1
> [timing] scala init:  00h 00m 12s
> Initializing spark
> --2016-03-14 07:05:30--  
> http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz
> Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12
> Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... 
> connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2016-03-14 07:05:30 ERROR 404: Not Found.
> ERROR: Unknown Spark version
> spark/init.sh: line 137: return: -1: invalid option
> return: usage: return [n]
> Unpacking Spark
> tar (child): spark-*.tgz: Cannot open: No such file or directory
> tar (child): Error is not recoverable: exiting now
> tar: Child returned status 2
> tar: Error is not recoverable: exiting now
> rm: cannot remove `spark-*.tgz': No such file or directory
> mv: missing destination file operand after `spark'
> Try `mv --help' for more information.
> Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 
> present



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download

2016-03-16 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196901#comment-15196901
 ] 

Patrick Wendell commented on SPARK-13855:
-

I've uploaded the artifacts, thanks.

> Spark 1.6.1 artifacts not found in S3 bucket / direct download
> --
>
> Key: SPARK-13855
> URL: https://issues.apache.org/jira/browse/SPARK-13855
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.6.1
> Environment: production
>Reporter: Sandesh Deshmane
>Assignee: Patrick Wendell
> Fix For: 1.6.1
>
>
> Getting below error while deploying spark on EC2 with version 1.6.1
> [timing] scala init:  00h 00m 12s
> Initializing spark
> --2016-03-14 07:05:30--  
> http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz
> Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12
> Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... 
> connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2016-03-14 07:05:30 ERROR 404: Not Found.
> ERROR: Unknown Spark version
> spark/init.sh: line 137: return: -1: invalid option
> return: usage: return [n]
> Unpacking Spark
> tar (child): spark-*.tgz: Cannot open: No such file or directory
> tar (child): Error is not recoverable: exiting now
> tar: Child returned status 2
> tar: Error is not recoverable: exiting now
> rm: cannot remove `spark-*.tgz': No such file or directory
> mv: missing destination file operand after `spark'
> Try `mv --help' for more information.
> Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 
> present



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download

2016-03-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reassigned SPARK-13855:
---

Assignee: Patrick Wendell  (was: Michael Armbrust)

> Spark 1.6.1 artifacts not found in S3 bucket / direct download
> --
>
> Key: SPARK-13855
> URL: https://issues.apache.org/jira/browse/SPARK-13855
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.6.1
> Environment: production
>Reporter: Sandesh Deshmane
>Assignee: Patrick Wendell
>
> Getting below error while deploying spark on EC2 with version 1.6.1
> [timing] scala init:  00h 00m 12s
> Initializing spark
> --2016-03-14 07:05:30--  
> http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz
> Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12
> Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... 
> connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2016-03-14 07:05:30 ERROR 404: Not Found.
> ERROR: Unknown Spark version
> spark/init.sh: line 137: return: -1: invalid option
> return: usage: return [n]
> Unpacking Spark
> tar (child): spark-*.tgz: Cannot open: No such file or directory
> tar (child): Error is not recoverable: exiting now
> tar: Child returned status 2
> tar: Error is not recoverable: exiting now
> rm: cannot remove `spark-*.tgz': No such file or directory
> mv: missing destination file operand after `spark'
> Try `mv --help' for more information.
> Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 
> present



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame

2015-12-10 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-12148:

Priority: Major  (was: Critical)

> SparkR: rename DataFrame to SparkDataFrame
> --
>
> Key: SPARK-12148
> URL: https://issues.apache.org/jira/browse/SPARK-12148
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SparkR
>Reporter: Michael Lawrence
>
> The SparkR package represents a Spark DataFrame with the class "DataFrame". 
> That conflicts with the more general DataFrame class defined in the S4Vectors 
> package. Would it not be more appropriate to use the name "SparkDataFrame" 
> instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame

2015-12-10 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-12148:

Issue Type: Improvement  (was: Wish)

> SparkR: rename DataFrame to SparkDataFrame
> --
>
> Key: SPARK-12148
> URL: https://issues.apache.org/jira/browse/SPARK-12148
> Project: Spark
>  Issue Type: Improvement
>  Components: R, SparkR
>Reporter: Michael Lawrence
>Priority: Critical
>
> The SparkR package represents a Spark DataFrame with the class "DataFrame". 
> That conflicts with the more general DataFrame class defined in the S4Vectors 
> package. Would it not be more appropriate to use the name "SparkDataFrame" 
> instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame

2015-12-10 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-12148:

Priority: Critical  (was: Minor)

> SparkR: rename DataFrame to SparkDataFrame
> --
>
> Key: SPARK-12148
> URL: https://issues.apache.org/jira/browse/SPARK-12148
> Project: Spark
>  Issue Type: Wish
>  Components: R, SparkR
>Reporter: Michael Lawrence
>Priority: Critical
>
> The SparkR package represents a Spark DataFrame with the class "DataFrame". 
> That conflicts with the more general DataFrame class defined in the S4Vectors 
> package. Would it not be more appropriate to use the name "SparkDataFrame" 
> instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive

2015-12-02 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036960#comment-15036960
 ] 

Patrick Wendell commented on SPARK-12110:
-

Hey Andrew, could you show exactly the command you are running to run this 
example? Also, if you simply download Spark 1.5.1 and run the same command 
locally rather than in your modified EC2 cluster, does it work?

> spark-1.5.1-bin-hadoop2.6;  pyspark.ml.feature  Exception: ("You must build 
> Spark with Hive 
> 
>
> Key: SPARK-12110
> URL: https://issues.apache.org/jira/browse/SPARK-12110
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.5.1
> Environment: cluster created using 
> spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2
>Reporter: Andrew Davidson
>
> I am using spark-1.5.1-bin-hadoop2.6. I used 
> spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured 
> spark-env to use python3. I can not run the tokenizer sample code. Is there a 
> work around?
> Kind regards
> Andy
> {code}
> /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
> 658 raise Exception("You must build Spark with Hive. "
> 659 "Export 'SPARK_HIVE=true' and run "
> --> 660 "build/sbt assembly", e)
> 661 
> 662 def _get_hive_ctx(self):
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
> build/sbt assembly", Py4JJavaError('An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))
> http://spark.apache.org/docs/latest/ml-features.html#tokenizer
> from pyspark.ml.feature import Tokenizer, RegexTokenizer
> sentenceDataFrame = sqlContext.createDataFrame([
>   (0, "Hi I heard about Spark"),
>   (1, "I wish Java could use case classes"),
>   (2, "Logistic,regression,models,are,neat")
> ], ["label", "sentence"])
> tokenizer = Tokenizer(inputCol="sentence", outputCol="words")
> wordsDataFrame = tokenizer.transform(sentenceDataFrame)
> for words_label in wordsDataFrame.select("words", "label").take(3):
>   print(words_label)
> ---
> Py4JJavaError Traceback (most recent call last)
> /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
> 654 if not hasattr(self, '_scala_HiveContext'):
> --> 655 self._scala_HiveContext = self._get_hive_ctx()
> 656 return self._scala_HiveContext
> /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self)
> 662 def _get_hive_ctx(self):
> --> 663 return self._jvm.HiveContext(self._jsc.sc())
> 664 
> /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in 
> __call__(self, *args)
> 700 return_value = get_return_value(answer, self._gateway_client, 
> None,
> --> 701 self._fqn)
> 702 
> /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
>  35 try:
> ---> 36 return f(*a, **kw)
>  37 except py4j.protocol.Py4JJavaError as e:
> /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 299 'An error occurred while calling {0}{1}{2}.\n'.
> --> 300 format(target_id, '.', name), value)
> 301 else:
> Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.
> : java.lang.RuntimeException: java.io.IOException: Filesystem closed
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171)
>   at 
> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
>   at 
> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:214)
>   at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>   at py4j.commands.ConstructorCommand.execut

[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive

2015-12-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-12110:

Description: 
I am using spark-1.5.1-bin-hadoop2.6. I used 
spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured 
spark-env to use python3. I can not run the tokenizer sample code. Is there a 
work around?

Kind regards

Andy

{code}
/root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
658 raise Exception("You must build Spark with Hive. "
659 "Export 'SPARK_HIVE=true' and run "
--> 660 "build/sbt assembly", e)
661 
662 def _get_hive_ctx(self):

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
build/sbt assembly", Py4JJavaError('An error occurred while calling 
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))




http://spark.apache.org/docs/latest/ml-features.html#tokenizer

from pyspark.ml.feature import Tokenizer, RegexTokenizer

sentenceDataFrame = sqlContext.createDataFrame([
  (0, "Hi I heard about Spark"),
  (1, "I wish Java could use case classes"),
  (2, "Logistic,regression,models,are,neat")
], ["label", "sentence"])
tokenizer = Tokenizer(inputCol="sentence", outputCol="words")
wordsDataFrame = tokenizer.transform(sentenceDataFrame)
for words_label in wordsDataFrame.select("words", "label").take(3):
  print(words_label)

---
Py4JJavaError Traceback (most recent call last)
/root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
654 if not hasattr(self, '_scala_HiveContext'):
--> 655 self._scala_HiveContext = self._get_hive_ctx()
656 return self._scala_HiveContext

/root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self)
662 def _get_hive_ctx(self):
--> 663 return self._jvm.HiveContext(self._jsc.sc())
664 

/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in 
__call__(self, *args)
700 return_value = get_return_value(answer, self._gateway_client, 
None,
--> 701 self._fqn)
702 

/root/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
 35 try:
---> 36 return f(*a, **kw)
 37 except py4j.protocol.Py4JJavaError as e:

/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in 
get_return_value(answer, gateway_client, target_id, name)
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:

Py4JJavaError: An error occurred while calling 
None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.io.IOException: Filesystem closed
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at 
org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171)
at 
org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
at 
org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1057)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:554)
at 
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at 
org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 15 more


During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
 in (

[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive

2015-12-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-12110:

Component/s: (was: ML)
 (was: SQL)
 (was: PySpark)
 EC2

> spark-1.5.1-bin-hadoop2.6;  pyspark.ml.feature  Exception: ("You must build 
> Spark with Hive 
> 
>
> Key: SPARK-12110
> URL: https://issues.apache.org/jira/browse/SPARK-12110
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.5.1
> Environment: cluster created using 
> spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2
>Reporter: Andrew Davidson
>
> I am using spark-1.5.1-bin-hadoop2.6. I used 
> spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured 
> spark-env to use python3. I can not run the tokenizer sample code. Is there a 
> work around?
> Kind regards
> Andy
> /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
> 658 raise Exception("You must build Spark with Hive. "
> 659 "Export 'SPARK_HIVE=true' and run "
> --> 660 "build/sbt assembly", e)
> 661 
> 662 def _get_hive_ctx(self):
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
> build/sbt assembly", Py4JJavaError('An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))
> http://spark.apache.org/docs/latest/ml-features.html#tokenizer
> from pyspark.ml.feature import Tokenizer, RegexTokenizer
> sentenceDataFrame = sqlContext.createDataFrame([
>   (0, "Hi I heard about Spark"),
>   (1, "I wish Java could use case classes"),
>   (2, "Logistic,regression,models,are,neat")
> ], ["label", "sentence"])
> tokenizer = Tokenizer(inputCol="sentence", outputCol="words")
> wordsDataFrame = tokenizer.transform(sentenceDataFrame)
> for words_label in wordsDataFrame.select("words", "label").take(3):
>   print(words_label)
> ---
> Py4JJavaError Traceback (most recent call last)
> /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self)
> 654 if not hasattr(self, '_scala_HiveContext'):
> --> 655 self._scala_HiveContext = self._get_hive_ctx()
> 656 return self._scala_HiveContext
> /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self)
> 662 def _get_hive_ctx(self):
> --> 663 return self._jvm.HiveContext(self._jsc.sc())
> 664 
> /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in 
> __call__(self, *args)
> 700 return_value = get_return_value(answer, self._gateway_client, 
> None,
> --> 701 self._fqn)
> 702 
> /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
>  35 try:
> ---> 36 return f(*a, **kw)
>  37 except py4j.protocol.Py4JJavaError as e:
> /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 299 'An error occurred while calling {0}{1}{2}.\n'.
> --> 300 format(target_id, '.', name), value)
> 301 else:
> Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.
> : java.lang.RuntimeException: java.io.IOException: Filesystem closed
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171)
>   at 
> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
>   at 
> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
>   at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:214)
>   at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>   at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.

[jira] [Comment Edited] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-22 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021511#comment-15021511
 ] 

Patrick Wendell edited comment on SPARK-11903 at 11/23/15 4:29 AM:
---

I think it's simply dead code that should be deleted. SKIP_JAVA_TEST related to 
a check we did regarding whether Java 6 was being used instead of Java 7. It 
doesn't have anything to do with unit tests. Spark now requires Java 7, so the 
test has been removed, but the parser still handles that variable. It was just 
an omission not deleted as part of SPARK-7733 
(https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14)
 /cc [~srowen].


was (Author: pwendell):
I think it's simply dead code. SKIP_JAVA_TEST related to a check we did 
regarding whether Java 6 was being used instead of Java 7. It doesn't have 
anything to do with unit tests. Spark now requires Java 7, so the test has been 
removed, but the parser still handles that variable. It was just an omission 
not deleted as part of SPARK-7733 
(https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14)
 /cc [~srowen].

> Deprecate make-distribution.sh --skip-java-test
> ---
>
> Key: SPARK-11903
> URL: https://issues.apache.org/jira/browse/SPARK-11903
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not 
> appear to be 
> used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73],
>  and tests are [always 
> skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170].
>  Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other 
> than [this 
> one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73].
> If this option is not needed, we should deprecate and eventually remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-22 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021511#comment-15021511
 ] 

Patrick Wendell commented on SPARK-11903:
-

I think it's simply dead code. SKIP_JAVA_TEST related to a check we did 
regarding whether Java 6 was being used instead of Java 7. It doesn't have 
anything to do with unit tests. Spark now requires Java 7, so the test has been 
removed, but the parser still handles that variable. It was just an omission 
not deleted as part of SPARK-7733 
(https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14)
 /cc [~srowen].

> Deprecate make-distribution.sh --skip-java-test
> ---
>
> Key: SPARK-11903
> URL: https://issues.apache.org/jira/browse/SPARK-11903
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not 
> appear to be 
> used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73],
>  and tests are [always 
> skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170].
>  Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other 
> than [this 
> one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73].
> If this option is not needed, we should deprecate and eventually remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11326) Support for authentication and encryption in standalone mode

2015-11-09 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997448#comment-14997448
 ] 

Patrick Wendell commented on SPARK-11326:
-

There are a few related conversations here:

1. The feature set of standalone scheduler and goals. The main goal of that 
scheduler is to make it easy for people to download and run Spark with minimal 
extra dependencies. The main difference between the standalone mode and other 
schedulers is that we aren't providing support for scheduling other frameworks 
than Spark (and likely never will). Other than that, features are added on a 
case-by-case basis depending on whether there is sufficient commitment from the 
maintainers to support the feature long term.

2. Security in non-YARN modes. I would actually like to see better support for 
security in other modes of Spark, the main reason being supporting the large 
number of users not inside of Hadoop deployments. BTW, I think the existing 
security architecture of Spark makes this possible, because the concern of 
distributing a shared secret is largely decoupled from the specific security 
mechanism. But we haven't really exposed public hooks for injecting secrets. 
There is also the question of secure job submission which is addressed in this 
JIRA. This needs some thought and probably makes sense to discuss on the Spark 
1.7 timeframe.

Overall I think some broader questions need to be answered, and it's something 
perhaps we can discuss once 1.6 is out the door as we think about 1.7.

> Support for authentication and encryption in standalone mode
> 
>
> Key: SPARK-11326
> URL: https://issues.apache.org/jira/browse/SPARK-11326
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Jacek Lewandowski
>
> h3.The idea
> Currently, in standalone mode, all components, for all network connections 
> need to use the same secure token if they want to have any security ensured. 
> This ticket is intended to split the communication in standalone mode to make 
> it more like in Yarn mode - application internal communication and scheduler 
> communication.
> Such refactoring will allow for the scheduler (master, workers) to use a 
> distinct secret, which will remain unknown for the users. Similarly, it will 
> allow for better security in applications, because each application will be 
> able to use a distinct secret as well. 
> By providing SASL authentication/encryption for connections between a client 
> (Client or AppClient) and Spark Master, it becomes possible introducing 
> pluggable authentication for standalone deployment mode.
> h3.Improvements introduced by this patch
> This patch introduces the following changes:
> * Spark driver or submission client do not have to use the same secret as 
> workers use to communicate with Master
> * Master is able to authenticate individual clients with the following rules:
> ** When connecting to the master, the client needs to specify 
> {{spark.authenticate.secret}} which is an authentication token for the user 
> specified by {{spark.authenticate.user}} ({{sparkSaslUser}} by default)
> ** Master configuration may include additional 
> {{spark.authenticate.secrets.}} entries for specifying 
> authentication token for particular users or 
> {{spark.authenticate.authenticatorClass}} which specify an implementation of 
> external credentials provider (which is able to retrieve the authentication 
> token for a given user).
> ** Workers authenticate with Master as default user {{sparkSaslUser}}. 
> * The authorization rules are as follows:
> ** A regular user is able to manage only his own application (the application 
> which he submitted)
> ** A regular user is not able to register or manager workers
> ** Spark default user {{sparkSaslUser}} can manage all the applications
> h3.User facing changes when running application
> h4.General principles:
> - conf: {{spark.authenticate.secret}} is *never sent* over the wire
> - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
> - In all situations env variable will overwrite conf variable if present. 
> - In all situations when a user has to pass a secret, it is better (safer) to 
> do this through env variable
> - In work modes with multiple secrets we assume encrypted communication 
> between client and master, between driver and master, between master and 
> workers
> 
> h4.Work modes and descriptions
> h5.Client mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf: 
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is running locally
> - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: 
> {{spark.authenticate.secret}}
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or con

[jira] [Updated] (SPARK-11236) Upgrade Tachyon dependency to 0.8.0

2015-11-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11236:

Assignee: Calvin Jia

> Upgrade Tachyon dependency to 0.8.0
> ---
>
> Key: SPARK-11236
> URL: https://issues.apache.org/jira/browse/SPARK-11236
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Calvin Jia
>Assignee: Calvin Jia
> Fix For: 1.6.0
>
>
> Update the tachyon-client dependency from 0.7.1 to 0.8.0. There are no new 
> dependencies added or Spark facing APIs changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11236) Upgrade Tachyon dependency to 0.8.0

2015-11-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-11236.
-
   Resolution: Fixed
Fix Version/s: 1.6.0

> Upgrade Tachyon dependency to 0.8.0
> ---
>
> Key: SPARK-11236
> URL: https://issues.apache.org/jira/browse/SPARK-11236
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Calvin Jia
> Fix For: 1.6.0
>
>
> Update the tachyon-client dependency from 0.7.1 to 0.8.0. There are no new 
> dependencies added or Spark facing APIs changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11446) Spark 1.6 release notes

2015-11-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984776#comment-14984776
 ] 

Patrick Wendell commented on SPARK-11446:
-

I think this is redundant with the "releasenotes" tag so I am closing it.

> Spark 1.6 release notes
> ---
>
> Key: SPARK-11446
> URL: https://issues.apache.org/jira/browse/SPARK-11446
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> This is a staging location where we can keep track of changes that need to be 
> documented in the release notes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-11446) Spark 1.6 release notes

2015-11-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell closed SPARK-11446.
---
Resolution: Invalid

> Spark 1.6 release notes
> ---
>
> Key: SPARK-11446
> URL: https://issues.apache.org/jira/browse/SPARK-11446
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> This is a staging location where we can keep track of changes that need to be 
> documented in the release notes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11238) SparkR: Documentation change for merge function

2015-11-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984646#comment-14984646
 ] 

Patrick Wendell commented on SPARK-11238:
-

I created SPARK-11446 and linked it here.

> SparkR: Documentation change for merge function
> ---
>
> Key: SPARK-11238
> URL: https://issues.apache.org/jira/browse/SPARK-11238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>  Labels: releasenotes
>
> As discussed in pull request: https://github.com/apache/spark/pull/9012, the 
> signature of the merge function will be changed, therefore documentation 
> change is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11446) Spark 1.6 release notes

2015-11-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11446:

Target Version/s: 1.6.0

> Spark 1.6 release notes
> ---
>
> Key: SPARK-11446
> URL: https://issues.apache.org/jira/browse/SPARK-11446
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> This is a staging location where we can keep track of changes that need to be 
> documented in the release notes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11446) Spark 1.6 release notes

2015-11-01 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-11446:
---

 Summary: Spark 1.6 release notes
 Key: SPARK-11446
 URL: https://issues.apache.org/jira/browse/SPARK-11446
 Project: Spark
  Issue Type: Task
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Michael Armbrust
Priority: Critical


This is a staging location where we can keep track of changes that need to be 
documented in the release notes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-25 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973510#comment-14973510
 ] 

Patrick Wendell edited comment on SPARK-10971 at 10/26/15 12:02 AM:


Reynold has sent out the vote email based on the tagged commit. Since that vote 
is likely to pass, this patch will probably be in 1.5.3.


was (Author: pwendell):
Reynold has sent out the vote email based on the original fix. Since that vote 
is likely to pass, this patch will probably be in 1.5.3.

> sparkR: RRunner should allow setting path to Rscript
> 
>
> Key: SPARK-10971
> URL: https://issues.apache.org/jira/browse/SPARK-10971
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Sun Rui
> Fix For: 1.5.3, 1.6.0
>
>
> I'm running spark on yarn and trying to use R in cluster mode. RRunner seems 
> to just call Rscript and assumes its in the path. But on our YARN deployment 
> R isn't installed on the nodes so it needs to be distributed along with the 
> job and we need the ability to point to where it gets installed. sparkR in 
> client mode has the config spark.sparkr.r.command to point to Rscript. 
> RRunner should have something similar so it works in cluster mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-25 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973510#comment-14973510
 ] 

Patrick Wendell commented on SPARK-10971:
-

Reynold has sent out the vote email based on the original fix. Since that vote 
is likely to pass, this patch will probably be in 1.5.3.

> sparkR: RRunner should allow setting path to Rscript
> 
>
> Key: SPARK-10971
> URL: https://issues.apache.org/jira/browse/SPARK-10971
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Sun Rui
> Fix For: 1.5.3, 1.6.0
>
>
> I'm running spark on yarn and trying to use R in cluster mode. RRunner seems 
> to just call Rscript and assumes its in the path. But on our YARN deployment 
> R isn't installed on the nodes so it needs to be distributed along with the 
> job and we need the ability to point to where it gets installed. sparkR in 
> client mode has the config spark.sparkr.r.command to point to Rscript. 
> RRunner should have something similar so it works in cluster mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript

2015-10-25 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10971:

Fix Version/s: (was: 1.5.2)
   1.5.3

> sparkR: RRunner should allow setting path to Rscript
> 
>
> Key: SPARK-10971
> URL: https://issues.apache.org/jira/browse/SPARK-10971
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Sun Rui
> Fix For: 1.5.3, 1.6.0
>
>
> I'm running spark on yarn and trying to use R in cluster mode. RRunner seems 
> to just call Rscript and assumes its in the path. But on our YARN deployment 
> R isn't installed on the nodes so it needs to be distributed along with the 
> job and we need the ability to point to where it gets installed. sparkR in 
> client mode has the config spark.sparkr.r.command to point to Rscript. 
> RRunner should have something similar so it works in cluster mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page

2015-10-25 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973493#comment-14973493
 ] 

Patrick Wendell commented on SPARK-11305:
-

/cc [~srowen] for his thoughts.

> Remove Third-Party Hadoop Distributions Doc Page
> 
>
> Key: SPARK-11305
> URL: https://issues.apache.org/jira/browse/SPARK-11305
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Priority: Critical
>
> There is a fairly old page in our docs that contains a bunch of assorted 
> information regarding running Spark on Hadoop clusters. I think this page 
> should be removed and merged into other parts of the docs because the 
> information is largely redundant and somewhat outdated.
> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
> There are three sections:
> 1. Compile time Hadoop version - this information I think can be removed in 
> favor of that on the "building spark" page. These days most "advanced users" 
> are building without bundling Hadoop, so I'm not sure giving them a bunch of 
> different Hadoop versions sends the right message.
> 2. Linking against Hadoop - this doesn't seem to add much beyond what is in 
> the programming guide.
> 3. Where to run Spark - redundant with the hardware provisioning guide.
> 4. Inheriting cluster configurations - I think this would be better as a 
> section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page

2015-10-25 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-11305:
---

 Summary: Remove Third-Party Hadoop Distributions Doc Page
 Key: SPARK-11305
 URL: https://issues.apache.org/jira/browse/SPARK-11305
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Priority: Critical


There is a fairly old page in our docs that contains a bunch of assorted 
information regarding running Spark on Hadoop clusters. I think this page 
should be removed and merged into other parts of the docs because the 
information is largely redundant and somewhat outdated.

http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html

There are three sections:

1. Compile time Hadoop version - this information I think can be removed in 
favor of that on the "building spark" page. These days most "advanced users" 
are building without bundling Hadoop, so I'm not sure giving them a bunch of 
different Hadoop versions sends the right message.

2. Linking against Hadoop - this doesn't seem to add much beyond what is in the 
programming guide.

3. Where to run Spark - redundant with the hardware provisioning guide.

4. Inheriting cluster configurations - I think this would be better as a 
section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11070) Remove older releases on dist.apache.org

2015-10-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-11070.
-
Resolution: Fixed

> Remove older releases on dist.apache.org
> 
>
> Key: SPARK-11070
> URL: https://issues.apache.org/jira/browse/SPARK-11070
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Sean Owen
>Assignee: Patrick Wendell
>Priority: Trivial
> Attachments: SPARK-11070.patch
>
>
> dist.apache.org should be periodically cleaned up such that it only includes 
> the latest releases in each active minor release branch. This is to reduce 
> load on mirrors. It can probably lose the 1.2.x releases at this point. In 
> total this would clean out 6 of the 9 releases currently mirrored at 
> https://dist.apache.org/repos/dist/release/spark/ 
> All releases are always archived at archive.apache.org and continue to be 
> available. The JS behind spark.apache.org/downloads.html needs to be updated 
> to point at archive.apache.org for older releases, then.
> There won't be a pull request for this as it's strictly an update to the site 
> hosted in SVN, and the files hosted by Apache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11070) Remove older releases on dist.apache.org

2015-10-16 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961515#comment-14961515
 ] 

Patrick Wendell commented on SPARK-11070:
-

I removed them - I did leave 1.5.0 for now, but we can remove it in a bit - 
just because 1.5.1 is so new.

{code}
svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.1.1 -m "Remving 
Spark 1.1.1 release"
svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.2.1 -m "Remving 
Spark 1.2.1 release"
svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.2.2 -m "Remving 
Spark 1.2.2 release"
svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.3.0 -m "Remving 
Spark 1.3.0 release"
svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.4.0 -m "Remving 
Spark 1.4.0 release"
{code}

> Remove older releases on dist.apache.org
> 
>
> Key: SPARK-11070
> URL: https://issues.apache.org/jira/browse/SPARK-11070
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Sean Owen
>Assignee: Patrick Wendell
>Priority: Trivial
> Attachments: SPARK-11070.patch
>
>
> dist.apache.org should be periodically cleaned up such that it only includes 
> the latest releases in each active minor release branch. This is to reduce 
> load on mirrors. It can probably lose the 1.2.x releases at this point. In 
> total this would clean out 6 of the 9 releases currently mirrored at 
> https://dist.apache.org/repos/dist/release/spark/ 
> All releases are always archived at archive.apache.org and continue to be 
> available. The JS behind spark.apache.org/downloads.html needs to be updated 
> to point at archive.apache.org for older releases, then.
> There won't be a pull request for this as it's strictly an update to the site 
> hosted in SVN, and the files hosted by Apache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11070) Remove older releases on dist.apache.org

2015-10-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reassigned SPARK-11070:
---

Assignee: Patrick Wendell

> Remove older releases on dist.apache.org
> 
>
> Key: SPARK-11070
> URL: https://issues.apache.org/jira/browse/SPARK-11070
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Sean Owen
>Assignee: Patrick Wendell
>Priority: Trivial
> Attachments: SPARK-11070.patch
>
>
> dist.apache.org should be periodically cleaned up such that it only includes 
> the latest releases in each active minor release branch. This is to reduce 
> load on mirrors. It can probably lose the 1.2.x releases at this point. In 
> total this would clean out 6 of the 9 releases currently mirrored at 
> https://dist.apache.org/repos/dist/release/spark/ 
> All releases are always archived at archive.apache.org and continue to be 
> available. The JS behind spark.apache.org/downloads.html needs to be updated 
> to point at archive.apache.org for older releases, then.
> There won't be a pull request for this as it's strictly an update to the site 
> hosted in SVN, and the files hosted by Apache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment

2015-10-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10877:

Assignee: Davies Liu

> Assertions fail straightforward DataFrame job due to word alignment
> ---
>
> Key: SPARK-10877
> URL: https://issues.apache.org/jira/browse/SPARK-10877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>Assignee: Davies Liu
> Attachments: SparkFilterByKeyTest.scala
>
>
> I have some code that I’m running in a unit test suite, but the code I’m 
> running is failing with an assertion error.
> I have translated the JUnit test that was failing, to a Scala script that I 
> will attach to the ticket. The assertion error is the following:
> {code}
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
> Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: 
> lengthInBytes must be a multiple of 8 (word-aligned)
> at 
> org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53)
> at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149)
> at 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247)
> at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85)
> at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> {code}
> However, it turns out that this code actually works normally and computes the 
> correct result if assertions are turned off.
> I traced the code and found that when hashUnsafeWords was called, it was 
> given a byte-length of 12, which clearly is not a multiple of 8. However, the 
> job seems to compute correctly regardless of this fact. Of course, I can’t 
> just disable assertions for my unit test though.
> A few things we need to understand:
> 1. Why is the lengthInBytes of size 12?
> 2. Is it actually a problem that the byte length is not word-aligned? If so, 
> how should we fix the byte length? If it's not a problem, why is the 
> assertion flagging a false negative?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11006:

Component/s: SQL

> Rename NullColumnAccess as NullColumnAccessor
> -
>
> Key: SPARK-11006
> URL: https://issues.apache.org/jira/browse/SPARK-11006
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Trivial
> Fix For: 1.6.0
>
>
> In sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala 
> , NullColumnAccess should be renmaed as NullColumnAccessor so that same 
> convention is adhered to for the accessors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11056) Improve documentation on how to build Spark efficiently

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11056:

Component/s: Documentation

> Improve documentation on how to build Spark efficiently
> ---
>
> Key: SPARK-11056
> URL: https://issues.apache.org/jira/browse/SPARK-11056
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
> Fix For: 1.5.2, 1.6.0
>
>
> Slow build times are a common pain point for new Spark developers.  We should 
> improve the main documentation on building Spark to describe how to make 
> building Spark less painful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11081) Shade Jersey dependency to work around the compatibility issue with Jersey2

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11081:

Component/s: Build

> Shade Jersey dependency to work around the compatibility issue with Jersey2
> ---
>
> Key: SPARK-11081
> URL: https://issues.apache.org/jira/browse/SPARK-11081
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Reporter: Mingyu Kim
>
> As seen from this thread 
> (https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCALte62yD8H3=2KVMiFs7NZjn929oJ133JkPLrNEj=vrx-d2...@mail.gmail.com%3E),
>  Spark is incompatible with Jersey 2 especially when Spark is embedded in an 
> application running with Jersey.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-11092:

Assignee: Jakob Odersky

> Add source URLs to API documentation.
> -
>
> Key: SPARK-11092
> URL: https://issues.apache.org/jira/browse/SPARK-11092
> Project: Spark
>  Issue Type: Documentation
>  Components: Build, Documentation
>Reporter: Jakob Odersky
>Assignee: Jakob Odersky
>Priority: Trivial
>
> It would be nice to have source URLs in the Spark scaladoc, similar to the 
> standard library (e.g. 
> http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).
> The fix should be really simple, just adding a line to the sbt unidoc 
> settings.
> I'll use the github repo url 
> bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH}
> Feel free to tell me if I should use something else as base url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11111) Fast null-safe join

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1:

Component/s: SQL

> Fast null-safe join
> ---
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Today, null safe joins are executed with a Cartesian product.
> {code}
> scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain
> == Physical Plan ==
> TungstenProject [i#2,j#3,i#7,j#8]
>  Filter (i#2 <=> i#7)
>   CartesianProduct
>LocalTableScan [i#2,j#3], [[1,1]]
>LocalTableScan [i#7,j#8], [[1,1]]
> {code}
> One option is to add this rewrite to the optimizer:
> {code}
> select * 
> from t a 
> join t b 
>   on coalesce(a.i, ) = coalesce(b.i, ) AND (a.i <=> b.i)
> {code}
> Acceptance criteria: joins with only null safe equality should not result in 
> a Cartesian product.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11115) IPv6 regression

2015-10-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958078#comment-14958078
 ] 

Patrick Wendell edited comment on SPARK-5 at 10/15/15 12:38 AM:


The title of this says "Regression" - did it regress from a previous version? I 
am going to update the title, let me know if there is any issue.


was (Author: pwendell):
The title of this says "Regression" - did it regression from a previous 
version? I am going to update the title, let me know if there is any issue.

> IPv6 regression
> ---
>
> Key: SPARK-5
> URL: https://issues.apache.org/jira/browse/SPARK-5
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
> Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6
>Reporter: Thomas Dudziak
>Priority: Critical
>
> When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error:
> 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname
>   at scala.Predef$.assert(Predef.scala:179)
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:805)
>   at 
> org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48)
>   at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190)
>   at org.apache.spark.SparkContext.(SparkContext.scala:528)
>   at 
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> Looking at the code in question, it seems that the code will only work for 
> IPv4 as it assumes ':' can't be part of the hostname (which it clearly can 
> for IPv6 addresses).
> Instead, the code should probably use Guava's HostAndPort class, i.e.:
>   def checkHost(host: String, message: String = "") {
> assert(!HostAndPort.fromString(host).hasPort, message)
>   }
>   def checkHostPort(hostPort: String, message: String = "") {
> assert(HostAndPort.fromString(hostPort).hasPort, message)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11115) Host verification is not correct for IPv6

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5:

Summary: Host verification is not correct for IPv6  (was: IPv6 regression)

> Host verification is not correct for IPv6
> -
>
> Key: SPARK-5
> URL: https://issues.apache.org/jira/browse/SPARK-5
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
> Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6
>Reporter: Thomas Dudziak
>Priority: Critical
>
> When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error:
> 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname
>   at scala.Predef$.assert(Predef.scala:179)
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:805)
>   at 
> org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48)
>   at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190)
>   at org.apache.spark.SparkContext.(SparkContext.scala:528)
>   at 
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> Looking at the code in question, it seems that the code will only work for 
> IPv4 as it assumes ':' can't be part of the hostname (which it clearly can 
> for IPv6 addresses).
> Instead, the code should probably use Guava's HostAndPort class, i.e.:
>   def checkHost(host: String, message: String = "") {
> assert(!HostAndPort.fromString(host).hasPort, message)
>   }
>   def checkHostPort(hostPort: String, message: String = "") {
> assert(HostAndPort.fromString(hostPort).hasPort, message)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11115) IPv6 regression

2015-10-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958078#comment-14958078
 ] 

Patrick Wendell commented on SPARK-5:
-

The title of this says "Regression" - did it regression from a previous 
version? I am going to update the title, let me know if there is any issue.

> IPv6 regression
> ---
>
> Key: SPARK-5
> URL: https://issues.apache.org/jira/browse/SPARK-5
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
> Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6
>Reporter: Thomas Dudziak
>Priority: Critical
>
> When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error:
> 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname
>   at scala.Predef$.assert(Predef.scala:179)
>   at org.apache.spark.util.Utils$.checkHost(Utils.scala:805)
>   at 
> org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48)
>   at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190)
>   at org.apache.spark.SparkContext.(SparkContext.scala:528)
>   at 
> org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
> Looking at the code in question, it seems that the code will only work for 
> IPv4 as it assumes ':' can't be part of the hostname (which it clearly can 
> for IPv6 addresses).
> Instead, the code should probably use Guava's HostAndPort class, i.e.:
>   def checkHost(host: String, message: String = "") {
> assert(!HostAndPort.fromString(host).hasPort, message)
>   }
>   def checkHostPort(hostPort: String, message: String = "") {
> assert(HostAndPort.fromString(hostPort).hasPort, message)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11110) Scala 2.11 build fails due to compiler errors

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-0:

Priority: Critical  (was: Major)

> Scala 2.11 build fails due to compiler errors
> -
>
> Key: SPARK-0
> URL: https://issues.apache.org/jira/browse/SPARK-0
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Jakob Odersky
>Priority: Critical
>
> Right now the 2.11 build is failing due to compiler errors in SBT (though not 
> in Maven). I have updated our 2.11 compile test harness to catch this.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull
> {code}
> [error] 
> /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
>  no valid targets for annotation on value conf - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
> [error] 
> {code}
> This is one error, but there may be others past this point (the compile fails 
> fast).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11110) Scala 2.11 build fails due to compiler errors

2015-10-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-0:

Assignee: Jakob Odersky

> Scala 2.11 build fails due to compiler errors
> -
>
> Key: SPARK-0
> URL: https://issues.apache.org/jira/browse/SPARK-0
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Jakob Odersky
>
> Right now the 2.11 build is failing due to compiler errors in SBT (though not 
> in Maven). I have updated our 2.11 compile test harness to catch this.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull
> {code}
> [error] 
> /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
>  no valid targets for annotation on value conf - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
> [error] 
> {code}
> This is one error, but there may be others past this point (the compile fails 
> fast).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11110) Scala 2.11 build fails due to compiler errors

2015-10-14 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-0:
---

 Summary: Scala 2.11 build fails due to compiler errors
 Key: SPARK-0
 URL: https://issues.apache.org/jira/browse/SPARK-0
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell


Right now the 2.11 build is failing due to compiler errors in SBT (though not 
in Maven). I have updated our 2.11 compile test harness to catch this.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull

{code}
[error] 
/home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
 no valid targets for annotation on value conf - it is discarded unused. You 
may specify targets with meta-annotations, e.g. @(transient @param)
[error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
[error] 
{code}

This is one error, but there may be others past this point (the compile fails 
fast).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6230) Provide authentication and encryption for Spark's RPC

2015-10-13 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954523#comment-14954523
 ] 

Patrick Wendell commented on SPARK-6230:


Should we update Spark's documentation to explain this? I think at present it 
only discusses encrypted RPC via akka. But this will be the new recommended way 
to encrypt RPC.

> Provide authentication and encryption for Spark's RPC
> -
>
> Key: SPARK-6230
> URL: https://issues.apache.org/jira/browse/SPARK-6230
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Reporter: Marcelo Vanzin
>
> Make sure the RPC layer used by Spark supports the auth and encryption 
> features of the network/common module.
> This kinda ignores akka; adding support for SASL to akka, while possible, 
> seems to be at odds with the direction being taken in Spark, so let's 
> restrict this to the new RPC layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Target Version/s: 1.5.1

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Priority: Critical  (was: Major)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Description: 
In 1.5.0 there are some extra classes in the Spark docs - including a bunch of 
test classes. We need to figure out what commit introduced those and fix it. 
The obvious things like genJavadoc version have not changed.

http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before]
http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after]


> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Affects Version/s: 1.5.0

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-10650:
---

 Summary: Spark docs include test and other extra classes
 Key: SPARK-10650
 URL: https://issues.apache.org/jira/browse/SPARK-10650
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6942) Umbrella: UI Visualizations for Core and Dataframes

2015-09-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6942:
---
Assignee: Andrew Or  (was: Patrick Wendell)

> Umbrella: UI Visualizations for Core and Dataframes 
> 
>
> Key: SPARK-6942
> URL: https://issues.apache.org/jira/browse/SPARK-6942
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL, Web UI
>Reporter: Patrick Wendell
>Assignee: Andrew Or
> Fix For: 1.5.0
>
>
> This is an umbrella issue for the assorted visualization proposals for 
> Spark's UI. The scope will likely cover Spark 1.4 and 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10623:

Component/s: SQL

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10511) Source releases should not include maven jars

2015-09-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10511:

Assignee: Luciano Resende

> Source releases should not include maven jars
> -
>
> Key: SPARK-10511
> URL: https://issues.apache.org/jira/browse/SPARK-10511
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Luciano Resende
>Priority: Blocker
>
> I noticed our source jars seemed really big for 1.5.0. At least one 
> contributing factor is that, likely due to some change in the release script, 
> the maven jars are being bundled in with the source code in our build 
> directory. This runs afoul of the ASF policy on binaries in source releases - 
> we should fix it in 1.5.1.
> The issue (I think) is that we might invoke maven to compute the version 
> between when we checkout Spark from github and when we package the source 
> file. I think it could be fixed by simply clearing out the build/ directory 
> after that statement runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics

2015-09-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745690#comment-14745690
 ] 

Patrick Wendell commented on SPARK-10620:
-

/cc [~imranr] and [~srowen] for any comments. In my mind the goal here is just 
to produce some design thoughts and not to actually do it (at this point).

> Look into whether accumulator mechanism can replace TaskMetrics
> ---
>
> Key: SPARK-10620
> URL: https://issues.apache.org/jira/browse/SPARK-10620
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>
> This task is simply to explore whether the internal representation used by 
> TaskMetrics could be performed by using accumulators rather than having two 
> separate mechanisms. Note that we need to continue to preserve the existing 
> "Task Metric" data structures that are exposed to users through event logs 
> etc. The question is can we use a single internal codepath and perhaps make 
> this easier to extend in the future.
> I think a full exploration would answer the following questions:
> - How do the semantics of accumulators on stage retries differ from aggregate 
> TaskMetrics for a stage? Could we implement clearer retry semantics for 
> internal accumulators to allow them to be the same - for instance, zeroing 
> accumulator values if a stage is retried (see discussion here: SPARK-10042).
> - Are there metrics that do not fit well into the accumulator model, or would 
> be difficult to update as an accumulator.
> - If we expose metrics through accumulators in the future rather than 
> continuing to add fields to TaskMetrics, what is the best way to coerce 
> compatibility?
> - Are there any other considerations?
> - Is it worth it to do this, or is the consolidation too complicated to 
> justify?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics

2015-09-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10620:

Description: 
This task is simply to explore whether the internal representation used by 
TaskMetrics could be performed by using accumulators rather than having two 
separate mechanisms. Note that we need to continue to preserve the existing 
"Task Metric" data structures that are exposed to users through event logs etc. 
The question is can we use a single internal codepath and perhaps make this 
easier to extend in the future.

I think a full exploration would answer the following questions:
- How do the semantics of accumulators on stage retries differ from aggregate 
TaskMetrics for a stage? Could we implement clearer retry semantics for 
internal accumulators to allow them to be the same - for instance, zeroing 
accumulator values if a stage is retried (see discussion here: SPARK-10042).
- Are there metrics that do not fit well into the accumulator model, or would 
be difficult to update as an accumulator.
- If we expose metrics through accumulators in the future rather than 
continuing to add fields to TaskMetrics, what is the best way to coerce 
compatibility?
- Are there any other considerations?
- Is it worth it to do this, or is the consolidation too complicated to justify?

  was:
This task is simply to explore whether the internal representation used by 
TaskMetrics could be performed by using accumulators rather than having two 
separate mechanisms. Note that we need to continue to preserve the existing 
"Task Metric" data structures that are exposed to users through event logs etc. 
The question is can we use a single internal codepath and perhaps make this 
easier to extend in the future.

I think there are a few things to look into:
- How do the semantics of accumulators on stage retries differ from aggregate 
TaskMetrics for a stage? Could we implement clearer retry semantics for 
internal accumulators to allow them to be the same - for instance, zeroing 
accumulator values if a stage is retried (see discussion here: SPARK-10042).
- Are there metrics that do not fit well into the accumulator model, or would 
be difficult to update as an accumulator.
- If we expose metrics through accumulators in the future rather than 
continuing to add fields to TaskMetrics, what is the best way to coerce 
compatibility?
- Is it worth it to do this, or is the consolidation too complicated to justify?


> Look into whether accumulator mechanism can replace TaskMetrics
> ---
>
> Key: SPARK-10620
> URL: https://issues.apache.org/jira/browse/SPARK-10620
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>
> This task is simply to explore whether the internal representation used by 
> TaskMetrics could be performed by using accumulators rather than having two 
> separate mechanisms. Note that we need to continue to preserve the existing 
> "Task Metric" data structures that are exposed to users through event logs 
> etc. The question is can we use a single internal codepath and perhaps make 
> this easier to extend in the future.
> I think a full exploration would answer the following questions:
> - How do the semantics of accumulators on stage retries differ from aggregate 
> TaskMetrics for a stage? Could we implement clearer retry semantics for 
> internal accumulators to allow them to be the same - for instance, zeroing 
> accumulator values if a stage is retried (see discussion here: SPARK-10042).
> - Are there metrics that do not fit well into the accumulator model, or would 
> be difficult to update as an accumulator.
> - If we expose metrics through accumulators in the future rather than 
> continuing to add fields to TaskMetrics, what is the best way to coerce 
> compatibility?
> - Are there any other considerations?
> - Is it worth it to do this, or is the consolidation too complicated to 
> justify?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics

2015-09-15 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-10620:
---

 Summary: Look into whether accumulator mechanism can replace 
TaskMetrics
 Key: SPARK-10620
 URL: https://issues.apache.org/jira/browse/SPARK-10620
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Andrew Or


This task is simply to explore whether the internal representation used by 
TaskMetrics could be performed by using accumulators rather than having two 
separate mechanisms. Note that we need to continue to preserve the existing 
"Task Metric" data structures that are exposed to users through event logs etc. 
The question is can we use a single internal codepath and perhaps make this 
easier to extend in the future.

I think there are a few things to look into:
- How do the semantics of accumulators on stage retries differ from aggregate 
TaskMetrics for a stage? Could we implement clearer retry semantics for 
internal accumulators to allow them to be the same - for instance, zeroing 
accumulator values if a stage is retried (see discussion here: SPARK-10042).
- Are there metrics that do not fit well into the accumulator model, or would 
be difficult to update as an accumulator.
- If we expose metrics through accumulators in the future rather than 
continuing to add fields to TaskMetrics, what is the best way to coerce 
compatibility?
- Is it worth it to do this, or is the consolidation too complicated to justify?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10600) SparkSQL - Support for Not Exists in a Correlated Subquery

2015-09-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10600:

Component/s: SQL

> SparkSQL - Support for Not Exists in a Correlated Subquery
> --
>
> Key: SPARK-10600
> URL: https://issues.apache.org/jira/browse/SPARK-10600
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Richard Garris
>
> Spark SQL currently does not support NOT EXISTS clauses (e.g. 
> SELECT * FROM TABLE_A WHERE NOT EXISTS ( SELECT 1 FROM TABLE_B where 
> TABLE_B.id = TABLE_A.id)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10601) Spark SQL - Support for MINUS

2015-09-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10601:

Component/s: SQL

> Spark SQL - Support for MINUS
> -
>
> Key: SPARK-10601
> URL: https://issues.apache.org/jira/browse/SPARK-10601
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Richard Garris
>
> Spark SQL does not current supported SQL Minus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10576) Move .java files out of src/main/scala

2015-09-12 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742280#comment-14742280
 ] 

Patrick Wendell commented on SPARK-10576:
-

FWIW - seems to me like moving them into /java makes sense. If we are going to 
have src/main/scala and src/main/java, might as well use them correctly. What 
do you think [~rxin].

> Move .java files out of src/main/scala
> --
>
> Key: SPARK-10576
> URL: https://issues.apache.org/jira/browse/SPARK-10576
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.5.0
>Reporter: Sean Owen
>Priority: Minor
>
> (I suppose I'm really asking for an opinion on this, rather than asserting it 
> must be done, but seems worthwhile. CC [~rxin] and [~pwendell])
> As pointed out on the mailing list, there are some Java files in the Scala 
> source tree:
> {code}
> ./bagel/src/main/scala/org/apache/spark/bagel/package-info.java
> ./core/src/main/scala/org/apache/spark/annotation/AlphaComponent.java
> ./core/src/main/scala/org/apache/spark/annotation/DeveloperApi.java
> ./core/src/main/scala/org/apache/spark/annotation/Experimental.java
> ./core/src/main/scala/org/apache/spark/annotation/package-info.java
> ./core/src/main/scala/org/apache/spark/annotation/Private.java
> ./core/src/main/scala/org/apache/spark/api/java/package-info.java
> ./core/src/main/scala/org/apache/spark/broadcast/package-info.java
> ./core/src/main/scala/org/apache/spark/executor/package-info.java
> ./core/src/main/scala/org/apache/spark/io/package-info.java
> ./core/src/main/scala/org/apache/spark/rdd/package-info.java
> ./core/src/main/scala/org/apache/spark/scheduler/package-info.java
> ./core/src/main/scala/org/apache/spark/serializer/package-info.java
> ./core/src/main/scala/org/apache/spark/util/package-info.java
> ./core/src/main/scala/org/apache/spark/util/random/package-info.java
> ./external/flume/src/main/scala/org/apache/spark/streaming/flume/package-info.java
> ./external/kafka/src/main/scala/org/apache/spark/streaming/kafka/package-info.java
> ./external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/package-info.java
> ./external/twitter/src/main/scala/org/apache/spark/streaming/twitter/package-info.java
> ./external/zeromq/src/main/scala/org/apache/spark/streaming/zeromq/package-info.java
> ./graphx/src/main/scala/org/apache/spark/graphx/impl/EdgeActiveness.java
> ./graphx/src/main/scala/org/apache/spark/graphx/lib/package-info.java
> ./graphx/src/main/scala/org/apache/spark/graphx/package-info.java
> ./graphx/src/main/scala/org/apache/spark/graphx/TripletFields.java
> ./graphx/src/main/scala/org/apache/spark/graphx/util/package-info.java
> ./mllib/src/main/scala/org/apache/spark/ml/attribute/package-info.java
> ./mllib/src/main/scala/org/apache/spark/ml/package-info.java
> ./mllib/src/main/scala/org/apache/spark/mllib/package-info.java
> ./sql/catalyst/src/main/scala/org/apache/spark/sql/types/SQLUserDefinedType.java
> ./sql/hive/src/main/scala/org/apache/spark/sql/hive/package-info.java
> ./streaming/src/main/scala/org/apache/spark/streaming/api/java/package-info.java
> ./streaming/src/main/scala/org/apache/spark/streaming/dstream/package-info.java
> ./streaming/src/main/scala/org/apache/spark/streaming/StreamingContextState.java
> {code}
> It happens to work since the Scala compiler plugin is handling both.
> On its face, they should be in the Java source tree. I'm trying to figure out 
> if there are good reasons they have to be in this less intuitive location.
> I might try moving them just to see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10511) Source releases should not include maven jars

2015-09-08 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-10511:
---

 Summary: Source releases should not include maven jars
 Key: SPARK-10511
 URL: https://issues.apache.org/jira/browse/SPARK-10511
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Patrick Wendell
Priority: Blocker


I noticed our source jars seemed really big for 1.5.0. At least one 
contributing factor is that, likely due to some change in the release script, 
the maven jars are being bundled in with the source code in our build 
directory. This runs afoul of the ASF policy on binaries in source releases - 
we should fix it in 1.5.1.

The issue (I think) is that we might invoke maven to compute the version 
between when we checkout Spark from github and when we package the source file. 
I think it could be fixed by simply clearing out the build/ directory after 
that statement runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4123) Show dependency changes in pull requests

2015-08-31 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4123.

Resolution: Duplicate

I've proposed a slightly different approach in SPARK-10359, so I'm closing this 
since there is high overlap.

> Show dependency changes in pull requests
> 
>
> Key: SPARK-4123
> URL: https://issues.apache.org/jira/browse/SPARK-4123
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Patrick Wendell
>Assignee: Brennon York
>Priority: Critical
>
> We should inspect the classpath of Spark's assembly jar for every pull 
> request. This only takes a few seconds in Maven and it will help weed out 
> dependency changes from the master branch. Ideally we'd post any dependency 
> changes in the pull request message.
> {code}
> $ mvn -Phive -Phadoop-2.4 dependency:build-classpath -pl assembly  | grep -v 
> INFO | tr : "\n" | awk -F/ '{print $NF}' | sort > my-classpath
> $ git checkout apache/master
> $ mvn -Phive -Phadoop-2.4 dependency:build-classpath -pl assembly  | grep -v 
> INFO | tr : "\n" | awk -F/ '{print $NF}' | sort > master-classpath
> $ diff my-classpath master-classpath
> < chill-java-0.3.6.jar
> < chill_2.10-0.3.6.jar
> ---
> > chill-java-0.5.0.jar
> > chill_2.10-0.5.0.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests

2015-08-31 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723844#comment-14723844
 ] 

Patrick Wendell commented on SPARK-10359:
-

The approach in SPARK-4123 was a bit different, but there is some overlap. We 
ended up reverting that patch because it wasn't working consistently. I'll 
close that one as a dup of this one.

> Enumerate Spark's dependencies in a file and diff against it for new pull 
> requests 
> ---
>
> Key: SPARK-10359
> URL: https://issues.apache.org/jira/browse/SPARK-10359
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>
> Sometimes when we have dependency changes it can be pretty unclear what 
> transitive set of things are changing. If we enumerate all of the 
> dependencies and put them in a source file in the repo, we can make it so 
> that it is very explicit what is changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723792#comment-14723792
 ] 

Patrick Wendell commented on SPARK-10374:
-

Hey Matt,

I think the only thing that could have influenced you is that we changed our 
default advertised akka dependency. We used to advertise an older version of 
akka that shaded protobuf. What happens if you manually coerce that version of 
akka in your application?

Spark itself doesn't directly use protobuf. But some of our dependencies do, 
including both akka and Hadoop. My guess is that you are now in a situation 
where you can't reconcile the akka and hadoop protobuf versions and make them 
both happy. This would be consistent with the changes we made in 1.5 in 
SPARK-7042.

The fix would be to exclude all com.typsafe.akka artifacts from Spark and 
manually add org.spark-project.akka to your build.

However, since you didn't post a full stack trace, I can't know for sure 
whether it is akka that complains when you try to fix the protobuf version at 
2.4.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.ap

[jira] [Created] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests

2015-08-30 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-10359:
---

 Summary: Enumerate Spark's dependencies in a file and diff against 
it for new pull requests 
 Key: SPARK-10359
 URL: https://issues.apache.org/jira/browse/SPARK-10359
 Project: Spark
  Issue Type: New Feature
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell


Sometimes when we have dependency changes it can be pretty unclear what 
transitive set of things are changing. If we enumerate all of the dependencies 
and put them in a source file in the repo, we can make it so that it is very 
explicit what is changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9545) Run Maven tests in pull request builder if title has "[maven-test]" in it

2015-08-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-9545.

   Resolution: Fixed
Fix Version/s: 1.6.0

> Run Maven tests in pull request builder if title has "[maven-test]" in it
> -
>
> Key: SPARK-9545
> URL: https://issues.apache.org/jira/browse/SPARK-9545
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
> Fix For: 1.6.0
>
>
> We have infrastructure now in the build tooling for running maven tests, but 
> it's not actually used anywhere. With a very minor change we can support 
> running maven tests if the pull request title has "maven-test" in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has "[test-maven]" in it

2015-08-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-9545:
---
Summary: Run Maven tests in pull request builder if title has 
"[test-maven]" in it  (was: Run Maven tests in pull request builder if title 
has "[maven-test]" in it)

> Run Maven tests in pull request builder if title has "[test-maven]" in it
> -
>
> Key: SPARK-9545
> URL: https://issues.apache.org/jira/browse/SPARK-9545
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
> Fix For: 1.6.0
>
>
> We have infrastructure now in the build tooling for running maven tests, but 
> it's not actually used anywhere. With a very minor change we can support 
> running maven tests if the pull request title has "maven-test" in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9547) Allow testing pull requests with different Hadoop versions

2015-08-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-9547.

   Resolution: Fixed
Fix Version/s: 1.6.0

> Allow testing pull requests with different Hadoop versions
> --
>
> Key: SPARK-9547
> URL: https://issues.apache.org/jira/browse/SPARK-9547
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
> Fix For: 1.6.0
>
>
> Similar to SPARK-9545 we should allow testing different Hadoop profiles in 
> the PRB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2-->[2.11.3 or higher]

2015-08-10 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680885#comment-14680885
 ] 

Patrick Wendell commented on SPARK-7726:


[~srowen] [~dragos] This is cropping up again when trying to create a release 
candidate for Spark 1.5:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Release-All-Java7/26/console

> Maven Install Breaks When Upgrading Scala 2.11.2-->[2.11.3 or higher]
> -
>
> Key: SPARK-7726
> URL: https://issues.apache.org/jira/browse/SPARK-7726
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Iulian Dragos
>Priority: Blocker
> Fix For: 1.4.0
>
>
> This one took a long time to track down. The Maven install phase is part of 
> our release process. It runs the "scala:doc" target to generate doc jars. 
> Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in 
> a way that breaks our build. In both cases, it returned an error (there has 
> been a long running error here that we've always ignored), however in 2.11.3 
> that error became fatal and failed the entire build process. The upgrade 
> occurred in SPARK-7092. Here is a simple reproduction:
> {code}
> ./dev/change-version-to-2.11.sh
> mvn clean install -pl network/common -pl network/shuffle -DskipTests 
> -Dscala-2.11
> {code} 
> This command exits success when Spark is at Scala 2.11.2 and fails with 
> 2.11.3 or higher. In either case an error is printed:
> {code}
> [INFO] 
> [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
> spark-network-shuffle_2.11 ---
> /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
>  error: not found: type Type
>   protected Type type() { return Type.UPLOAD_BLOCK; }
> ^
> /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
>  error: not found: type Type
>   protected Type type() { return Type.STREAM_HANDLE; }
> ^
> /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
>  error: not found: type Type
>   protected Type type() { return Type.REGISTER_EXECUTOR; }
> ^
> /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
>  error: not found: type Type
>   protected Type type() { return Type.OPEN_BLOCKS; }
> ^
> model contains 22 documentable templates
> four errors found
> {code}
> Ideally we'd just dig in and fix this error. Unfortunately it's a very 
> confusing error and I have no idea why it is appearing. I'd propose reverting 
> SPARK-7092 in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-08-06 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660796#comment-14660796
 ] 

Patrick Wendell commented on SPARK-1517:


Hey Ryan,

IIRC - the Apache snapshot repository won't let us publish binaries that do not 
have SNAPSHOT in the version number. The reason is it expects to see 
timestamped snapshots so its garbage collection mechanism can work. We could 
look at adding sha1 hashes, before SNAPSHOT, but I think there is some chance 
this would break their cleanup.

In terms of posting more binaries - I can look at whether Databricks or 
Berkeley might be able to donate S3 resources for this, but it would have to be 
clearly maintained by those organizations and not branded as official Apache 
releases or anything like that.

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> --
>
> Key: SPARK-1517
> URL: https://issues.apache.org/jira/browse/SPARK-1517
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Critical
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-08-06 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660420#comment-14660420
 ] 

Patrick Wendell commented on SPARK-1517:


Hey Ryan,

For the maven snapshot releases - unfortunately we are constrained by maven's 
own SNAPSHOT version format which doesn't allow encoding anything other than 
the timestamp. It's just not supported in their SNAPSHOT mechanism. However, 
one thing we could see is whether we can align the timestamp with the time of 
the actual spark commit, rather than the time of publication of the SNAPSHOT 
release. I'm not sure if maven lets you provide a custom timestamp when 
publishing. If we had that feature users could look at the Spark commit log and 
do some manual association.

For the binaries, the reason why the same commit appears multiple times is that 
we do the build every four hours and always publish the latest one even if it's 
a duplicate. However, this could be modified pretty easily to just avoid 
double-publishing the same commit if there hasn't been any code change. Maybe 
create a JIRA for this?

In terms of how many older versions are available, the scripts we use for this 
have a tunable retention window. Right now I'm only keeping the last 4 builds, 
we could probably extend it to something like 10 builds. However, at some point 
I'm likely to blow out of space in my ASF user account. Since the binaries are 
quite large, I don't think at least using ASF infrastructure it's feasible to 
keep all past builds. We have 3000 commits in a typical Spark release, and it's 
a few gigs for each binary build.

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> --
>
> Key: SPARK-1517
> URL: https://issues.apache.org/jira/browse/SPARK-1517
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Critical
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9547) Allow testing pull requests with different Hadoop versions

2015-08-02 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9547:
--

 Summary: Allow testing pull requests with different Hadoop versions
 Key: SPARK-9547
 URL: https://issues.apache.org/jira/browse/SPARK-9547
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell


Similar to SPARK-9545 we should allow testing different Hadoop profiles in the 
PRB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has "[maven-test]" in it

2015-08-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-9545:
---
Issue Type: Improvement  (was: Bug)

> Run Maven tests in pull request builder if title has "[maven-test]" in it
> -
>
> Key: SPARK-9545
> URL: https://issues.apache.org/jira/browse/SPARK-9545
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>
> We have infrastructure now in the build tooling for running maven tests, but 
> it's not actually used anywhere. With a very minor change we can support 
> running maven tests if the pull request title has "maven-test" in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9545) Run Maven tests in pull request builder if title has "[maven-test]" in it

2015-08-02 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9545:
--

 Summary: Run Maven tests in pull request builder if title has 
"[maven-test]" in it
 Key: SPARK-9545
 URL: https://issues.apache.org/jira/browse/SPARK-9545
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell


We have infrastructure now in the build tooling for running maven tests, but 
it's not actually used anywhere. With a very minor change we can support 
running maven tests if the pull request title has "maven-test" in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script

2015-07-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-9423.

Resolution: Invalid

> Why do every other spark comiter keep suggesting to use spark-submit script
> ---
>
> Key: SPARK-9423
> URL: https://issues.apache.org/jira/browse/SPARK-9423
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 1.3.1
>Reporter: nirav patel
>
> I see that on spark forum and stackoverflow people keep suggesting to use 
> spark-submit.sh script as a way (only way) to launch spark jobs? Are we still 
> living in application server monolithic world where I need to run startup.sh 
> ? What if spark application is long running context that serves multiple 
> requests? What if user just don't want to use script? They want to embed 
> spark as a service in their application. 
> Please STOP suggesting user to use spark-submit script as an alternative. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script

2015-07-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645495#comment-14645495
 ] 

Patrick Wendell commented on SPARK-9423:


This is not a valid issue for JIRA (we use JIRA for project bugs and feature 
tracking). Please send an email to the spark-users list. Thanks.

> Why do every other spark comiter keep suggesting to use spark-submit script
> ---
>
> Key: SPARK-9423
> URL: https://issues.apache.org/jira/browse/SPARK-9423
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 1.3.1
>Reporter: nirav patel
>
> I see that on spark forum and stackoverflow people keep suggesting to use 
> spark-submit.sh script as a way (only way) to launch spark jobs? Are we still 
> living in application server monolithic world where I need to run startup.sh 
> ? What if spark application is long running context that serves multiple 
> requests? What if user just don't want to use script? They want to embed 
> spark as a service in their application. 
> Please STOP suggesting user to use spark-submit script as an alternative. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9335) Kinesis test hits rate limit

2015-07-24 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9335:
--

 Summary: Kinesis test hits rate limit
 Key: SPARK-9335
 URL: https://issues.apache.org/jira/browse/SPARK-9335
 Project: Spark
  Issue Type: Bug
  Components: Streaming, Tests
Reporter: Patrick Wendell
Assignee: Tathagata Das
Priority: Critical


This test is failing many pull request builds because of rate limits:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38396/testReport/org.apache.spark.streaming.kinesis/KinesisBackedBlockRDDSuite/_It_is_not_a_test_/

I disabled the test. I wonder if it's better to not have this test run by 
default since it's a bit brittle to depend on an external system like this 
(what if Kinesis goes down, for instance, it will block all development).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8455) Implement N-Gram Feature Transformer

2015-07-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8455:
---
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-8521

> Implement N-Gram Feature Transformer
> 
>
> Key: SPARK-8455
> URL: https://issues.apache.org/jira/browse/SPARK-8455
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Feynman Liang
>Assignee: Feynman Liang
>Priority: Minor
> Fix For: 1.5.0
>
>
> N-grams are a NLP feature representation which generalize bag of words to 
> include local context (the n-1 preceding words). We can implement N-grams in 
> ML as a feature transformer (likely directly after tokenization).
> For example, "this is a test" should tokenize to ["this","is","a","test"], 
> which upon applying a 2-gram feature transform should yield 
> [["this","is"],["is","a"],["a","test"]].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8471) Implement Discrete Cosine Transform feature transformer

2015-07-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8471:
---
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-8521

> Implement Discrete Cosine Transform feature transformer
> ---
>
> Key: SPARK-8471
> URL: https://issues.apache.org/jira/browse/SPARK-8471
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Feynman Liang
>Assignee: Feynman Liang
>Priority: Minor
> Fix For: 1.5.0
>
>
> Discrete cosine transform (DCT) is an invertible matrix transformation 
> commonly used to analyze signals (e.g. audio, images, video) in the frequency 
> domain. In contrast to the FFT, the DCT maps real vectors to real vectors. 
> The DCT is oftentimes used to provide an alternative feature representation 
> (e.g. spectrogram representations of audio and video) useful for 
> classification and frequency-domain analysis.
> Ideally, an implementation of the DCT should allow both forward and inverse 
> transforms. It should also work for any numeric datatype and both 1D and 2D 
> data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector

2015-07-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8703:
---
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-8521

> Add CountVectorizer as a ml transformer to convert document to words count 
> vector
> -
>
> Key: SPARK-8703
> URL: https://issues.apache.org/jira/browse/SPARK-8703
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: yuhao yang
>Assignee: yuhao yang
> Fix For: 1.5.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Converts a text document to a sparse vector of token counts. Similar to 
> http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
> I can further add an estimator to extract vocabulary from corpus if that's 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9036) SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol

2015-07-24 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640980#comment-14640980
 ] 

Patrick Wendell commented on SPARK-9036:


I'm actually a bit confused what the use case is here - [~rdub] could you give 
some more detail? I'm confused because AFAIK the code to actually convert the 
classes to JSON is private, so it's not possible for other people to use them 
directly. The patch that was merged to implement this feature adds an internal 
API that is not used inside of Spark, which is a bit strange.

Are you writing your own classes inside of the Spark namespace and then calling 
into the JsonProtocol directly?

> SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol
> 
>
> Key: SPARK-9036
> URL: https://issues.apache.org/jira/browse/SPARK-9036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.4.0, 1.4.1
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.5.0
>
>
> The JsonProtocol added in SPARK-3454 [doesn't 
> include|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L95-L96]
>  code for ser/de of 
> [{{SparkListenerExecutorMetricsUpdate}}|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L107-L110]
>  messages.
> The comment notes that they are "not used", which presumably refers to the 
> fact that the [{{EventLoggingListener}} doesn't write these 
> events|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L200-L201].
> However, individual listeners can and should make that determination for 
> themselves; I have recently written custom listeners that would like to 
> consume metrics-update messages as JSON, so it would be nice to round out the 
> JsonProtocol implementation by supporting them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9304) Improve backwards compatibility of SPARK-8401

2015-07-23 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9304:
--

 Summary: Improve backwards compatibility of SPARK-8401
 Key: SPARK-9304
 URL: https://issues.apache.org/jira/browse/SPARK-9304
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Michael Allman
Priority: Critical


In SPARK-8401 a backwards incompatible change was made to the scala 2.11 build 
process. It would be good to add scripts with the older names to avoid breaking 
compatibility for harnesses or other automated builds that build for Scala 
2.11. The can just be a one line shell script with a comment explaining it is 
for backwards compatibility purposes.

/cc [~srowen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8564) Add the Python API for Kinesis

2015-07-23 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8564:
---
Target Version/s: 1.5.0

> Add the Python API for Kinesis
> --
>
> Key: SPARK-8564
> URL: https://issues.apache.org/jira/browse/SPARK-8564
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6785:
---
Labels:   (was: spark.tc)

> DateUtils can not handle date before 1970/01/01 correctly
> -
>
> Key: SPARK-6785
> URL: https://issues.apache.org/jira/browse/SPARK-6785
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Christian Kadner
> Fix For: 1.5.0
>
>
> {code}
> scala> val d = new Date(100)
> d: java.sql.Date = 1969-12-31
> scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
> res1: java.sql.Date = 1970-01-01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5562) LDA should handle empty documents

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5562:
---
Labels: starter  (was: spark.tc starter)

> LDA should handle empty documents
> -
>
> Key: SPARK-5562
> URL: https://issues.apache.org/jira/browse/SPARK-5562
> Project: Spark
>  Issue Type: Test
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Alok Singh
>Priority: Minor
>  Labels: starter
> Fix For: 1.5.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Latent Dirichlet Allocation (LDA) could easily be given empty documents when 
> people select a small vocabulary.  We should check to make sure it is robust 
> to empty documents.
> This will hopefully take the form of a unit test, but may require modifying 
> the LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7357) Improving HBaseTest example

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7357:
---
Labels:   (was: spark.tc)

> Improving HBaseTest example
> ---
>
> Key: SPARK-7357
> URL: https://issues.apache.org/jira/browse/SPARK-7357
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 1.3.1
>Reporter: Jihong MA
>Assignee: Jihong MA
>Priority: Minor
> Fix For: 1.5.0
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> Minor improvement to HBaseTest example, when Hbase related configurations 
> e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are 
> not set to default (localhost:2181), connection to zookeeper might hang as 
> shown in following stack
> 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=xxx.xxx.xxx:2181 sessionTimeout=9 
> watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase
> 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to 
> xxx.xxx.xxx/9.30.94.121:2181, initiating session
> 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, 
> negotiated timeout = 4
> 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper 
> is null
> this is due to hbase-site.xml is not placed on spark class path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8746:
---
Labels: documentation test  (was: documentation spark.tc test)

> Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
> --
>
> Key: SPARK-8746
> URL: https://issues.apache.org/jira/browse/SPARK-8746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Christian Kadner
>Assignee: Christian Kadner
>Priority: Trivial
>  Labels: documentation, test
> Fix For: 1.4.1, 1.5.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) 
> describes how to generate golden answer files for new hive comparison test 
> cases. However the download link for the Hive 0.13.1 jars points to 
> https://hive.apache.org/downloads.html but none of the linked mirror sites 
> still has the 0.13.1 version.
> We need to update the link to 
> https://archive.apache.org/dist/hive/hive-0.13.1/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7265:
---
Labels:   (was: spark.tc)

> Improving documentation for Spark SQL Hive support 
> ---
>
> Key: SPARK-7265
> URL: https://issues.apache.org/jira/browse/SPARK-7265
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.3.1
>Reporter: Jihong MA
>Assignee: Jihong MA
>Priority: Trivial
> Fix For: 1.5.0
>
>
> miscellaneous documentation improvement for Spark SQL Hive support, Yarn 
> cluster deployment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2859:
---
Labels:   (was: spark.tc)

> Update url of Kryo project in related docs
> --
>
> Key: SPARK-2859
> URL: https://issues.apache.org/jira/browse/SPARK-2859
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Guancheng Chen
>Assignee: Guancheng Chen
>Priority: Trivial
> Fix For: 1.0.3, 1.1.0
>
>
> Kryo project has been migrated from googlecode to github, hence we need to 
> update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8639:
---
Labels:   (was: spark.tc)

> Instructions for executing jekyll in docs/README.md could be slightly more 
> clear, typo in docs/api.md
> -
>
> Key: SPARK-8639
> URL: https://issues.apache.org/jira/browse/SPARK-8639
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Rosstin Murphy
>Assignee: Rosstin Murphy
>Priority: Trivial
> Fix For: 1.4.1, 1.5.0
>
>
> In docs/README.md, the text states around line 31
> Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll 
> will create a directory called '_site' containing index.html as well as the 
> rest of the compiled files.
> It might be more clear if we said
> Execute 'jekyll build' from the 'docs/' directory to compile the site. 
> Compiling the site with Jekyll will create a directory called '_site' 
> containing index.html as well as the rest of the compiled files.
> In docs/api.md: "Here you can API docs for Spark and its submodules."
> should be something like: "Here you can read API docs for Spark and its 
> submodules."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6485:
---
Labels:   (was: spark.tc)

> Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
> --
>
> Key: SPARK-6485
> URL: https://issues.apache.org/jira/browse/SPARK-6485
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Xiangrui Meng
>
> We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in 
> PySpark. Internally, we can use DataFrames for serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7744) "Distributed matrix" section in MLlib "Data Types" documentation should be reordered.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7744:
---
Labels:   (was: spark.tc)

> "Distributed matrix" section in MLlib "Data Types" documentation should be 
> reordered.
> -
>
> Key: SPARK-7744
> URL: https://issues.apache.org/jira/browse/SPARK-7744
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, MLlib
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.3.2, 1.4.0
>
>
> The documentation for BlockMatrix should come after RowMatrix, 
> IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
> three types, and RowMatrix is considered the "basic" distributed matrix.  
> This will improve comprehensibility of the "Distributed matrix" section, 
> especially for the new reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7426:
---
Labels:   (was: spark.tc)

> spark.ml AttributeFactory.fromStructField should allow other NumericTypes
> -
>
> Key: SPARK-7426
> URL: https://issues.apache.org/jira/browse/SPARK-7426
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.5.0
>
>
> It currently only supports DoubleType, but it should support others, at least 
> for fromStructField (importing into ML attribute format, rather than 
> exporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8570) Improve MLlib Local Matrix Documentation.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8570:
---
Labels:   (was: spark.tc)

> Improve MLlib Local Matrix Documentation.
> -
>
> Key: SPARK-8570
> URL: https://issues.apache.org/jira/browse/SPARK-8570
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, MLlib
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.5.0
>
>
> Update the MLlib Data Types Local Matrix documentation as follows:
> -Include information on sparse matrices.
> -Add sparse matrix examples to the existing Scala and Java examples.
> -Add Python examples for both dense and sparse matrices (currently no Python 
> examples exist for the Local Matrix section).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7883:
---
Labels:   (was: spark.tc)

> Fixing broken trainImplicit example in MLlib Collaborative Filtering 
> documentation.
> ---
>
> Key: SPARK-7883
> URL: https://issues.apache.org/jira/browse/SPARK-7883
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Trivial
> Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0
>
>
> The trainImplicit Scala example near the end of the MLlib Collaborative 
> Filtering documentation refers to an ALS.trainImplicit function signature 
> that does not exist.  Rather than add an extra function, let's just fix the 
> example.
> Currently, the example refers to a function that would have the following 
> signature: 
> def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: 
> Double) : MatrixFactorizationModel
> Instead, let's change the example to refer to this function, which does exist 
> (notice the addition of the lambda parameter):
> def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: 
> Double, alpha: Double) : MatrixFactorizationModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8343:
---
Labels:   (was: spark.tc)

> Improve the Spark Streaming Guides
> --
>
> Key: SPARK-8343
> URL: https://issues.apache.org/jira/browse/SPARK-8343
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Streaming
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.4.1, 1.5.0
>
>
> Improve the Spark Streaming Guides by fixing broken links, rewording 
> confusing sections, fixing typos, adding missing words, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7977) Disallow println

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7977:
---
Labels: starter  (was: spark.tc starter)

> Disallow println
> 
>
> Key: SPARK-7977
> URL: https://issues.apache.org/jira/browse/SPARK-7977
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Reynold Xin
>Assignee: Jon Alter
>  Labels: starter
> Fix For: 1.5.0
>
>
> Very often we see pull requests that added println from debugging, but the 
> author forgot to remove it before code review.
> We can use the regex checker to disallow println. For legitimate use of 
> println, we can then disable the rule where they are used.
> Add to scalastyle-config.xml file:
> {code}
>class="org.scalastyle.scalariform.TokenChecker" enabled="true">
> ^println$
> 
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7969:
---
Labels:   (was: spark.tc)

> Drop method on Dataframes should handle Column
> --
>
> Key: SPARK-7969
> URL: https://issues.apache.org/jira/browse/SPARK-7969
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.4.0
>Reporter: Olivier Girardot
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.4.1, 1.5.0
>
>
> For now the drop method available on Dataframe since Spark 1.4.0 only accepts 
> a column name (as a string), it should also accept a Column as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7830) ML doc cleanup: logreg, classification link

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7830:
---
Labels:   (was: spark.tc)

> ML doc cleanup: logreg, classification link
> ---
>
> Key: SPARK-7830
> URL: https://issues.apache.org/jira/browse/SPARK-7830
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, MLlib
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Trivial
> Fix For: 1.4.0
>
>
> Add logistic regression to the list of Multiclass Classification Supported 
> Methods in the MLlib Classification and Regression documentation, and fix 
> related broken link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8927:
---
Labels:   (was: spark.tc)

> Doc format wrong for some config descriptions
> -
>
> Key: SPARK-8927
> URL: https://issues.apache.org/jira/browse/SPARK-8927
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: Jon Alter
>Assignee: Jon Alter
>Priority: Trivial
> Fix For: 1.4.2, 1.5.0
>
>
> In the docs, a couple descriptions of configuration (under Network) are not 
> inside  and are being displayed immediately under the section title 
> instead of in their row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7985) Remove "fittingParamMap" references. Update ML Doc "Estimator, Transformer, and Param" examples.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7985:
---
Labels:   (was: spark.tc)

> Remove "fittingParamMap" references. Update ML Doc "Estimator, Transformer, 
> and Param" examples.
> 
>
> Key: SPARK-7985
> URL: https://issues.apache.org/jira/browse/SPARK-7985
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, ML
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.4.0
>
>
> Update ML Doc's "Estimator, Transformer, and Param" Scala & Java examples to 
> use model.extractParamMap instead of model.fittingParamMap, which no longer 
> exists.  Remove all other references to fittingParamMap throughout Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example).

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7920:
---
Labels:   (was: spark.tc)

> Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example).
> 
>
> Key: SPARK-7920
> URL: https://issues.apache.org/jira/browse/SPARK-7920
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: 1.4.0
>
>
> The MLlib ChiSqSelector class is not serializable, and so the example in the 
> ChiSqSelector documentation fails.  Also, that example is missing the import 
> of ChiSqSelector.  ChiSqSelector should just extend Serializable.
> Steps:
> 1. Locate the MLlib ChiSqSelector documentation example.
> 2. Fix the example by adding an import statement for ChiSqSelector.
> 3. Attempt to run -> notice that it will fail due to ChiSqSelector not being 
> serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625739#comment-14625739
 ] 

Patrick Wendell edited comment on SPARK-1403 at 7/14/15 2:59 AM:
-

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible about the environment? Thanks!


was (Author: pwendell):
Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

> Spark on Mesos does not set Thread's context class loader
> -
>
> Key: SPARK-1403
> URL: https://issues.apache.org/jira/browse/SPARK-1403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.3.0, 1.4.0
> Environment: ubuntu 12.04 on vagrant
>Reporter: Bharath Bhushan
>Priority: Blocker
> Fix For: 1.0.0
>
>
> I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
> executor on mesos slave throws a  java.lang.ClassNotFoundException for 
> org.apache.spark.serializer.JavaSerializer.
> The lengthy discussion is here: 
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1403.

  Resolution: Fixed
Target Version/s:   (was: 1.5.0)

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

> Spark on Mesos does not set Thread's context class loader
> -
>
> Key: SPARK-1403
> URL: https://issues.apache.org/jira/browse/SPARK-1403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.3.0, 1.4.0
> Environment: ubuntu 12.04 on vagrant
>Reporter: Bharath Bhushan
>Priority: Blocker
> Fix For: 1.0.0
>
>
> I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
> executor on mesos slave throws a  java.lang.ClassNotFoundException for 
> org.apache.spark.serializer.JavaSerializer.
> The lengthy discussion is here: 
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2015-07-12 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624086#comment-14624086
 ] 

Patrick Wendell commented on SPARK-2089:


Yeah - we can open it again later if someone who maintains this code is wanting 
to work on this feature. I just want to have this JIRA reflect the current 
status (i.e. for 5 versions there hasn't been any action in Spark) which is 
that it is not actively being fixed and make sure the documentation correctly 
reflects what we have now, to discourage the use of a feature that does not 
work.

> With YARN, preferredNodeLocalityData isn't honored 
> ---
>
> Key: SPARK-2089
> URL: https://issues.apache.org/jira/browse/SPARK-2089
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when 
> constructing a Spark context that will dictate where to request executor 
> containers.
> This is currently broken because of a race condition.  The Spark-YARN code 
> runs the user class and waits for it to start up a SparkContext.  During its 
> initialization, the SparkContext will create a YarnClusterScheduler, which 
> notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
> immediately fetches the preferredNodeLocationData from the SparkContext and 
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, 
> setting preferredNodeLocationData comes after the rest of the initialization, 
> so, if the Spark-YARN code comes around quickly enough after being notified, 
> the data that's fetched is the empty unset version.  The occurred during all 
> of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >