[jira] [Commented] (SPARK-16685) audit release docs are ambiguous
[ https://issues.apache.org/jira/browse/SPARK-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391168#comment-15391168 ] Patrick Wendell commented on SPARK-16685: - These scripts are pretty old and I'm not sure if anyone still uses them. I had written them a while back as sanity tests for some release builds. Today, those things are tested broadly by the community so I think this has become redundant. [~rxin] are these still used? If not, it might be good to remove them from the source repo. > audit release docs are ambiguous > > > Key: SPARK-16685 > URL: https://issues.apache.org/jira/browse/SPARK-16685 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 1.6.2 >Reporter: jay vyas >Priority: Minor > > The dev/audit-release tooling is ambiguous. > - should it run against a real cluster? if so when? > - what should be in the release repo? Just jars? tarballs? ( i assume jars > because its a .ivy, but not sure). > - > https://github.com/apache/spark/tree/master/dev/audit-release -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download
[ https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-13855. - Resolution: Fixed Fix Version/s: 1.6.1 > Spark 1.6.1 artifacts not found in S3 bucket / direct download > -- > > Key: SPARK-13855 > URL: https://issues.apache.org/jira/browse/SPARK-13855 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.6.1 > Environment: production >Reporter: Sandesh Deshmane >Assignee: Patrick Wendell > Fix For: 1.6.1 > > > Getting below error while deploying spark on EC2 with version 1.6.1 > [timing] scala init: 00h 00m 12s > Initializing spark > --2016-03-14 07:05:30-- > http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz > Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12 > Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... > connected. > HTTP request sent, awaiting response... 404 Not Found > 2016-03-14 07:05:30 ERROR 404: Not Found. > ERROR: Unknown Spark version > spark/init.sh: line 137: return: -1: invalid option > return: usage: return [n] > Unpacking Spark > tar (child): spark-*.tgz: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > rm: cannot remove `spark-*.tgz': No such file or directory > mv: missing destination file operand after `spark' > Try `mv --help' for more information. > Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 > present -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download
[ https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196901#comment-15196901 ] Patrick Wendell commented on SPARK-13855: - I've uploaded the artifacts, thanks. > Spark 1.6.1 artifacts not found in S3 bucket / direct download > -- > > Key: SPARK-13855 > URL: https://issues.apache.org/jira/browse/SPARK-13855 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.6.1 > Environment: production >Reporter: Sandesh Deshmane >Assignee: Patrick Wendell > Fix For: 1.6.1 > > > Getting below error while deploying spark on EC2 with version 1.6.1 > [timing] scala init: 00h 00m 12s > Initializing spark > --2016-03-14 07:05:30-- > http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz > Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12 > Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... > connected. > HTTP request sent, awaiting response... 404 Not Found > 2016-03-14 07:05:30 ERROR 404: Not Found. > ERROR: Unknown Spark version > spark/init.sh: line 137: return: -1: invalid option > return: usage: return [n] > Unpacking Spark > tar (child): spark-*.tgz: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > rm: cannot remove `spark-*.tgz': No such file or directory > mv: missing destination file operand after `spark' > Try `mv --help' for more information. > Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 > present -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13855) Spark 1.6.1 artifacts not found in S3 bucket / direct download
[ https://issues.apache.org/jira/browse/SPARK-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reassigned SPARK-13855: --- Assignee: Patrick Wendell (was: Michael Armbrust) > Spark 1.6.1 artifacts not found in S3 bucket / direct download > -- > > Key: SPARK-13855 > URL: https://issues.apache.org/jira/browse/SPARK-13855 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.6.1 > Environment: production >Reporter: Sandesh Deshmane >Assignee: Patrick Wendell > > Getting below error while deploying spark on EC2 with version 1.6.1 > [timing] scala init: 00h 00m 12s > Initializing spark > --2016-03-14 07:05:30-- > http://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz > Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.50.12 > Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.50.12|:80... > connected. > HTTP request sent, awaiting response... 404 Not Found > 2016-03-14 07:05:30 ERROR 404: Not Found. > ERROR: Unknown Spark version > spark/init.sh: line 137: return: -1: invalid option > return: usage: return [n] > Unpacking Spark > tar (child): spark-*.tgz: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > rm: cannot remove `spark-*.tgz': No such file or directory > mv: missing destination file operand after `spark' > Try `mv --help' for more information. > Checked s3 bucket spark-related-packages and noticed that no spark 1.6.1 > present -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame
[ https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-12148: Priority: Major (was: Critical) > SparkR: rename DataFrame to SparkDataFrame > -- > > Key: SPARK-12148 > URL: https://issues.apache.org/jira/browse/SPARK-12148 > Project: Spark > Issue Type: Improvement > Components: R, SparkR >Reporter: Michael Lawrence > > The SparkR package represents a Spark DataFrame with the class "DataFrame". > That conflicts with the more general DataFrame class defined in the S4Vectors > package. Would it not be more appropriate to use the name "SparkDataFrame" > instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame
[ https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-12148: Priority: Critical (was: Minor) > SparkR: rename DataFrame to SparkDataFrame > -- > > Key: SPARK-12148 > URL: https://issues.apache.org/jira/browse/SPARK-12148 > Project: Spark > Issue Type: Wish > Components: R, SparkR >Reporter: Michael Lawrence >Priority: Critical > > The SparkR package represents a Spark DataFrame with the class "DataFrame". > That conflicts with the more general DataFrame class defined in the S4Vectors > package. Would it not be more appropriate to use the name "SparkDataFrame" > instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12148) SparkR: rename DataFrame to SparkDataFrame
[ https://issues.apache.org/jira/browse/SPARK-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-12148: Issue Type: Improvement (was: Wish) > SparkR: rename DataFrame to SparkDataFrame > -- > > Key: SPARK-12148 > URL: https://issues.apache.org/jira/browse/SPARK-12148 > Project: Spark > Issue Type: Improvement > Components: R, SparkR >Reporter: Michael Lawrence >Priority: Critical > > The SparkR package represents a Spark DataFrame with the class "DataFrame". > That conflicts with the more general DataFrame class defined in the S4Vectors > package. Would it not be more appropriate to use the name "SparkDataFrame" > instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-12110: Description: I am using spark-1.5.1-bin-hadoop2.6. I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured spark-env to use python3. I can not run the tokenizer sample code. Is there a work around? Kind regards Andy {code} /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) 658 raise Exception("You must build Spark with Hive. " 659 "Export 'SPARK_HIVE=true' and run " --> 660 "build/sbt assembly", e) 661 662 def _get_hive_ctx(self): Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError('An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) http://spark.apache.org/docs/latest/ml-features.html#tokenizer from pyspark.ml.feature import Tokenizer, RegexTokenizer sentenceDataFrame = sqlContext.createDataFrame([ (0, "Hi I heard about Spark"), (1, "I wish Java could use case classes"), (2, "Logistic,regression,models,are,neat") ], ["label", "sentence"]) tokenizer = Tokenizer(inputCol="sentence", outputCol="words") wordsDataFrame = tokenizer.transform(sentenceDataFrame) for words_label in wordsDataFrame.select("words", "label").take(3): print(words_label) --- Py4JJavaError Traceback (most recent call last) /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) 654 if not hasattr(self, '_scala_HiveContext'): --> 655 self._scala_HiveContext = self._get_hive_ctx() 656 return self._scala_HiveContext /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) 662 def _get_hive_ctx(self): --> 663 return self._jvm.HiveContext(self._jsc.sc()) 664 /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 700 return_value = get_return_value(answer, self._gateway_client, None, --> 701 self._fqn) 702 /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 35 try: ---> 36 return f(*a, **kw) 37 except py4j.protocol.Py4JJavaError as e: /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 299 'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1057) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:554) at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 15 more During handling of the above exception, another exception occurred: Exception Traceback (most recent call last) in
[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-12110: Component/s: (was: ML) (was: SQL) (was: PySpark) EC2 > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:214) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > Caused by:
[jira] [Commented] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036960#comment-15036960 ] Patrick Wendell commented on SPARK-12110: - Hey Andrew, could you show exactly the command you are running to run this example? Also, if you simply download Spark 1.5.1 and run the same command locally rather than in your modified EC2 cluster, does it work? > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:214) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) > at
[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021511#comment-15021511 ] Patrick Wendell commented on SPARK-11903: - I think it's simply dead code. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. > Deprecate make-distribution.sh --skip-java-test > --- > > Key: SPARK-11903 > URL: https://issues.apache.org/jira/browse/SPARK-11903 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Priority: Minor > > The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not > appear to be > used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73], > and tests are [always > skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170]. > Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other > than [this > one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73]. > If this option is not needed, we should deprecate and eventually remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021511#comment-15021511 ] Patrick Wendell edited comment on SPARK-11903 at 11/23/15 4:29 AM: --- I think it's simply dead code that should be deleted. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. was (Author: pwendell): I think it's simply dead code. SKIP_JAVA_TEST related to a check we did regarding whether Java 6 was being used instead of Java 7. It doesn't have anything to do with unit tests. Spark now requires Java 7, so the test has been removed, but the parser still handles that variable. It was just an omission not deleted as part of SPARK-7733 (https://github.com/apache/spark/commit/e84815dc333a69368a48e0152f02934980768a14) /cc [~srowen]. > Deprecate make-distribution.sh --skip-java-test > --- > > Key: SPARK-11903 > URL: https://issues.apache.org/jira/browse/SPARK-11903 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Priority: Minor > > The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does not > appear to be > used|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73], > and tests are [always > skipped|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L170]. > Searching the Spark codebase for {{SKIP_JAVA_TEST}} yields no results other > than [this > one|https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/make-distribution.sh#L72-L73]. > If this option is not needed, we should deprecate and eventually remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11326) Support for authentication and encryption in standalone mode
[ https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997448#comment-14997448 ] Patrick Wendell commented on SPARK-11326: - There are a few related conversations here: 1. The feature set of standalone scheduler and goals. The main goal of that scheduler is to make it easy for people to download and run Spark with minimal extra dependencies. The main difference between the standalone mode and other schedulers is that we aren't providing support for scheduling other frameworks than Spark (and likely never will). Other than that, features are added on a case-by-case basis depending on whether there is sufficient commitment from the maintainers to support the feature long term. 2. Security in non-YARN modes. I would actually like to see better support for security in other modes of Spark, the main reason being supporting the large number of users not inside of Hadoop deployments. BTW, I think the existing security architecture of Spark makes this possible, because the concern of distributing a shared secret is largely decoupled from the specific security mechanism. But we haven't really exposed public hooks for injecting secrets. There is also the question of secure job submission which is addressed in this JIRA. This needs some thought and probably makes sense to discuss on the Spark 1.7 timeframe. Overall I think some broader questions need to be answered, and it's something perhaps we can discuss once 1.6 is out the door as we think about 1.7. > Support for authentication and encryption in standalone mode > > > Key: SPARK-11326 > URL: https://issues.apache.org/jira/browse/SPARK-11326 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Jacek Lewandowski > > h3.The idea > Currently, in standalone mode, all components, for all network connections > need to use the same secure token if they want to have any security ensured. > This ticket is intended to split the communication in standalone mode to make > it more like in Yarn mode - application internal communication and scheduler > communication. > Such refactoring will allow for the scheduler (master, workers) to use a > distinct secret, which will remain unknown for the users. Similarly, it will > allow for better security in applications, because each application will be > able to use a distinct secret as well. > By providing SASL authentication/encryption for connections between a client > (Client or AppClient) and Spark Master, it becomes possible introducing > pluggable authentication for standalone deployment mode. > h3.Improvements introduced by this patch > This patch introduces the following changes: > * Spark driver or submission client do not have to use the same secret as > workers use to communicate with Master > * Master is able to authenticate individual clients with the following rules: > ** When connecting to the master, the client needs to specify > {{spark.authenticate.secret}} which is an authentication token for the user > specified by {{spark.authenticate.user}} ({{sparkSaslUser}} by default) > ** Master configuration may include additional > {{spark.authenticate.secrets.}} entries for specifying > authentication token for particular users or > {{spark.authenticate.authenticatorClass}} which specify an implementation of > external credentials provider (which is able to retrieve the authentication > token for a given user). > ** Workers authenticate with Master as default user {{sparkSaslUser}}. > * The authorization rules are as follows: > ** A regular user is able to manage only his own application (the application > which he submitted) > ** A regular user is not able to register or manager workers > ** Spark default user {{sparkSaslUser}} can manage all the applications > h3.User facing changes when running application > h4.General principles: > - conf: {{spark.authenticate.secret}} is *never sent* over the wire > - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire > - In all situations env variable will overwrite conf variable if present. > - In all situations when a user has to pass a secret, it is better (safer) to > do this through env variable > - In work modes with multiple secrets we assume encrypted communication > between client and master, between driver and master, between master and > workers > > h4.Work modes and descriptions > h5.Client mode, single secret > h6.Configuration > - env: {{SPARK_AUTH_SECRET=secret}} or conf: > {{spark.authenticate.secret=secret}} > h6.Description > - The driver is running locally > - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: > {{spark.authenticate.secret}} > - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: >
[jira] [Resolved] (SPARK-11236) Upgrade Tachyon dependency to 0.8.0
[ https://issues.apache.org/jira/browse/SPARK-11236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-11236. - Resolution: Fixed Fix Version/s: 1.6.0 > Upgrade Tachyon dependency to 0.8.0 > --- > > Key: SPARK-11236 > URL: https://issues.apache.org/jira/browse/SPARK-11236 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Calvin Jia > Fix For: 1.6.0 > > > Update the tachyon-client dependency from 0.7.1 to 0.8.0. There are no new > dependencies added or Spark facing APIs changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11236) Upgrade Tachyon dependency to 0.8.0
[ https://issues.apache.org/jira/browse/SPARK-11236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11236: Assignee: Calvin Jia > Upgrade Tachyon dependency to 0.8.0 > --- > > Key: SPARK-11236 > URL: https://issues.apache.org/jira/browse/SPARK-11236 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Calvin Jia >Assignee: Calvin Jia > Fix For: 1.6.0 > > > Update the tachyon-client dependency from 0.7.1 to 0.8.0. There are no new > dependencies added or Spark facing APIs changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11446) Spark 1.6 release notes
[ https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11446: Target Version/s: 1.6.0 > Spark 1.6 release notes > --- > > Key: SPARK-11446 > URL: https://issues.apache.org/jira/browse/SPARK-11446 > Project: Spark > Issue Type: Task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > This is a staging location where we can keep track of changes that need to be > documented in the release notes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11446) Spark 1.6 release notes
Patrick Wendell created SPARK-11446: --- Summary: Spark 1.6 release notes Key: SPARK-11446 URL: https://issues.apache.org/jira/browse/SPARK-11446 Project: Spark Issue Type: Task Components: Documentation Reporter: Patrick Wendell Assignee: Michael Armbrust Priority: Critical This is a staging location where we can keep track of changes that need to be documented in the release notes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11238) SparkR: Documentation change for merge function
[ https://issues.apache.org/jira/browse/SPARK-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984646#comment-14984646 ] Patrick Wendell commented on SPARK-11238: - I created SPARK-11446 and linked it here. > SparkR: Documentation change for merge function > --- > > Key: SPARK-11238 > URL: https://issues.apache.org/jira/browse/SPARK-11238 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Narine Kokhlikyan > Labels: releasenotes > > As discussed in pull request: https://github.com/apache/spark/pull/9012, the > signature of the merge function will be changed, therefore documentation > change is required. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11446) Spark 1.6 release notes
[ https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984776#comment-14984776 ] Patrick Wendell commented on SPARK-11446: - I think this is redundant with the "releasenotes" tag so I am closing it. > Spark 1.6 release notes > --- > > Key: SPARK-11446 > URL: https://issues.apache.org/jira/browse/SPARK-11446 > Project: Spark > Issue Type: Task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > This is a staging location where we can keep track of changes that need to be > documented in the release notes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-11446) Spark 1.6 release notes
[ https://issues.apache.org/jira/browse/SPARK-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell closed SPARK-11446. --- Resolution: Invalid > Spark 1.6 release notes > --- > > Key: SPARK-11446 > URL: https://issues.apache.org/jira/browse/SPARK-11446 > Project: Spark > Issue Type: Task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > This is a staging location where we can keep track of changes that need to be > documented in the release notes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page
[ https://issues.apache.org/jira/browse/SPARK-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973493#comment-14973493 ] Patrick Wendell commented on SPARK-11305: - /cc [~srowen] for his thoughts. > Remove Third-Party Hadoop Distributions Doc Page > > > Key: SPARK-11305 > URL: https://issues.apache.org/jira/browse/SPARK-11305 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Patrick Wendell >Priority: Critical > > There is a fairly old page in our docs that contains a bunch of assorted > information regarding running Spark on Hadoop clusters. I think this page > should be removed and merged into other parts of the docs because the > information is largely redundant and somewhat outdated. > http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html > There are three sections: > 1. Compile time Hadoop version - this information I think can be removed in > favor of that on the "building spark" page. These days most "advanced users" > are building without bundling Hadoop, so I'm not sure giving them a bunch of > different Hadoop versions sends the right message. > 2. Linking against Hadoop - this doesn't seem to add much beyond what is in > the programming guide. > 3. Where to run Spark - redundant with the hardware provisioning guide. > 4. Inheriting cluster configurations - I think this would be better as a > section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11305) Remove Third-Party Hadoop Distributions Doc Page
Patrick Wendell created SPARK-11305: --- Summary: Remove Third-Party Hadoop Distributions Doc Page Key: SPARK-11305 URL: https://issues.apache.org/jira/browse/SPARK-11305 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Priority: Critical There is a fairly old page in our docs that contains a bunch of assorted information regarding running Spark on Hadoop clusters. I think this page should be removed and merged into other parts of the docs because the information is largely redundant and somewhat outdated. http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html There are three sections: 1. Compile time Hadoop version - this information I think can be removed in favor of that on the "building spark" page. These days most "advanced users" are building without bundling Hadoop, so I'm not sure giving them a bunch of different Hadoop versions sends the right message. 2. Linking against Hadoop - this doesn't seem to add much beyond what is in the programming guide. 3. Where to run Spark - redundant with the hardware provisioning guide. 4. Inheriting cluster configurations - I think this would be better as a section at the end of the configuration page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973510#comment-14973510 ] Patrick Wendell commented on SPARK-10971: - Reynold has sent out the vote email based on the original fix. Since that vote is likely to pass, this patch will probably be in 1.5.3. > sparkR: RRunner should allow setting path to Rscript > > > Key: SPARK-10971 > URL: https://issues.apache.org/jira/browse/SPARK-10971 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Sun Rui > Fix For: 1.5.3, 1.6.0 > > > I'm running spark on yarn and trying to use R in cluster mode. RRunner seems > to just call Rscript and assumes its in the path. But on our YARN deployment > R isn't installed on the nodes so it needs to be distributed along with the > job and we need the ability to point to where it gets installed. sparkR in > client mode has the config spark.sparkr.r.command to point to Rscript. > RRunner should have something similar so it works in cluster mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973510#comment-14973510 ] Patrick Wendell edited comment on SPARK-10971 at 10/26/15 12:02 AM: Reynold has sent out the vote email based on the tagged commit. Since that vote is likely to pass, this patch will probably be in 1.5.3. was (Author: pwendell): Reynold has sent out the vote email based on the original fix. Since that vote is likely to pass, this patch will probably be in 1.5.3. > sparkR: RRunner should allow setting path to Rscript > > > Key: SPARK-10971 > URL: https://issues.apache.org/jira/browse/SPARK-10971 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Sun Rui > Fix For: 1.5.3, 1.6.0 > > > I'm running spark on yarn and trying to use R in cluster mode. RRunner seems > to just call Rscript and assumes its in the path. But on our YARN deployment > R isn't installed on the nodes so it needs to be distributed along with the > job and we need the ability to point to where it gets installed. sparkR in > client mode has the config spark.sparkr.r.command to point to Rscript. > RRunner should have something similar so it works in cluster mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10971) sparkR: RRunner should allow setting path to Rscript
[ https://issues.apache.org/jira/browse/SPARK-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10971: Fix Version/s: (was: 1.5.2) 1.5.3 > sparkR: RRunner should allow setting path to Rscript > > > Key: SPARK-10971 > URL: https://issues.apache.org/jira/browse/SPARK-10971 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Sun Rui > Fix For: 1.5.3, 1.6.0 > > > I'm running spark on yarn and trying to use R in cluster mode. RRunner seems > to just call Rscript and assumes its in the path. But on our YARN deployment > R isn't installed on the nodes so it needs to be distributed along with the > job and we need the ability to point to where it gets installed. sparkR in > client mode has the config spark.sparkr.r.command to point to Rscript. > RRunner should have something similar so it works in cluster mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11070) Remove older releases on dist.apache.org
[ https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reassigned SPARK-11070: --- Assignee: Patrick Wendell > Remove older releases on dist.apache.org > > > Key: SPARK-11070 > URL: https://issues.apache.org/jira/browse/SPARK-11070 > Project: Spark > Issue Type: Task > Components: Build >Reporter: Sean Owen >Assignee: Patrick Wendell >Priority: Trivial > Attachments: SPARK-11070.patch > > > dist.apache.org should be periodically cleaned up such that it only includes > the latest releases in each active minor release branch. This is to reduce > load on mirrors. It can probably lose the 1.2.x releases at this point. In > total this would clean out 6 of the 9 releases currently mirrored at > https://dist.apache.org/repos/dist/release/spark/ > All releases are always archived at archive.apache.org and continue to be > available. The JS behind spark.apache.org/downloads.html needs to be updated > to point at archive.apache.org for older releases, then. > There won't be a pull request for this as it's strictly an update to the site > hosted in SVN, and the files hosted by Apache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11070) Remove older releases on dist.apache.org
[ https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961515#comment-14961515 ] Patrick Wendell commented on SPARK-11070: - I removed them - I did leave 1.5.0 for now, but we can remove it in a bit - just because 1.5.1 is so new. {code} svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.1.1 -m "Remving Spark 1.1.1 release" svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.2.1 -m "Remving Spark 1.2.1 release" svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.2.2 -m "Remving Spark 1.2.2 release" svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.3.0 -m "Remving Spark 1.3.0 release" svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.4.0 -m "Remving Spark 1.4.0 release" {code} > Remove older releases on dist.apache.org > > > Key: SPARK-11070 > URL: https://issues.apache.org/jira/browse/SPARK-11070 > Project: Spark > Issue Type: Task > Components: Build >Reporter: Sean Owen >Assignee: Patrick Wendell >Priority: Trivial > Attachments: SPARK-11070.patch > > > dist.apache.org should be periodically cleaned up such that it only includes > the latest releases in each active minor release branch. This is to reduce > load on mirrors. It can probably lose the 1.2.x releases at this point. In > total this would clean out 6 of the 9 releases currently mirrored at > https://dist.apache.org/repos/dist/release/spark/ > All releases are always archived at archive.apache.org and continue to be > available. The JS behind spark.apache.org/downloads.html needs to be updated > to point at archive.apache.org for older releases, then. > There won't be a pull request for this as it's strictly an update to the site > hosted in SVN, and the files hosted by Apache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11070) Remove older releases on dist.apache.org
[ https://issues.apache.org/jira/browse/SPARK-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-11070. - Resolution: Fixed > Remove older releases on dist.apache.org > > > Key: SPARK-11070 > URL: https://issues.apache.org/jira/browse/SPARK-11070 > Project: Spark > Issue Type: Task > Components: Build >Reporter: Sean Owen >Assignee: Patrick Wendell >Priority: Trivial > Attachments: SPARK-11070.patch > > > dist.apache.org should be periodically cleaned up such that it only includes > the latest releases in each active minor release branch. This is to reduce > load on mirrors. It can probably lose the 1.2.x releases at this point. In > total this would clean out 6 of the 9 releases currently mirrored at > https://dist.apache.org/repos/dist/release/spark/ > All releases are always archived at archive.apache.org and continue to be > available. The JS behind spark.apache.org/downloads.html needs to be updated > to point at archive.apache.org for older releases, then. > There won't be a pull request for this as it's strictly an update to the site > hosted in SVN, and the files hosted by Apache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10877: Assignee: Davies Liu > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah >Assignee: Davies Liu > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11110) Scala 2.11 build fails due to compiler errors
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-0: Assignee: Jakob Odersky > Scala 2.11 build fails due to compiler errors > - > > Key: SPARK-0 > URL: https://issues.apache.org/jira/browse/SPARK-0 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Patrick Wendell >Assignee: Jakob Odersky > > Right now the 2.11 build is failing due to compiler errors in SBT (though not > in Maven). I have updated our 2.11 compile test harness to catch this. > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull > {code} > [error] > /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: > no valid targets for annotation on value conf - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf) > [error] > {code} > This is one error, but there may be others past this point (the compile fails > fast). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11110) Scala 2.11 build fails due to compiler errors
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-0: Priority: Critical (was: Major) > Scala 2.11 build fails due to compiler errors > - > > Key: SPARK-0 > URL: https://issues.apache.org/jira/browse/SPARK-0 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Patrick Wendell >Assignee: Jakob Odersky >Priority: Critical > > Right now the 2.11 build is failing due to compiler errors in SBT (though not > in Maven). I have updated our 2.11 compile test harness to catch this. > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull > {code} > [error] > /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: > no valid targets for annotation on value conf - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf) > [error] > {code} > This is one error, but there may be others past this point (the compile fails > fast). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11110) Scala 2.11 build fails due to compiler errors
Patrick Wendell created SPARK-0: --- Summary: Scala 2.11 build fails due to compiler errors Key: SPARK-0 URL: https://issues.apache.org/jira/browse/SPARK-0 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Right now the 2.11 build is failing due to compiler errors in SBT (though not in Maven). I have updated our 2.11 compile test harness to catch this. https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull {code} [error] /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: no valid targets for annotation on value conf - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf) [error] {code} This is one error, but there may be others past this point (the compile fails fast). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11115) IPv6 regression
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958078#comment-14958078 ] Patrick Wendell edited comment on SPARK-5 at 10/15/15 12:38 AM: The title of this says "Regression" - did it regress from a previous version? I am going to update the title, let me know if there is any issue. was (Author: pwendell): The title of this says "Regression" - did it regression from a previous version? I am going to update the title, let me know if there is any issue. > IPv6 regression > --- > > Key: SPARK-5 > URL: https://issues.apache.org/jira/browse/SPARK-5 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6 >Reporter: Thomas Dudziak >Priority: Critical > > When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error: > 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext. > java.lang.AssertionError: assertion failed: Expected hostname > at scala.Predef$.assert(Predef.scala:179) > at org.apache.spark.util.Utils$.checkHost(Utils.scala:805) > at > org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190) > at org.apache.spark.SparkContext.(SparkContext.scala:528) > at > org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) > Looking at the code in question, it seems that the code will only work for > IPv4 as it assumes ':' can't be part of the hostname (which it clearly can > for IPv6 addresses). > Instead, the code should probably use Guava's HostAndPort class, i.e.: > def checkHost(host: String, message: String = "") { > assert(!HostAndPort.fromString(host).hasPort, message) > } > def checkHostPort(hostPort: String, message: String = "") { > assert(HostAndPort.fromString(hostPort).hasPort, message) > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11081) Shade Jersey dependency to work around the compatibility issue with Jersey2
[ https://issues.apache.org/jira/browse/SPARK-11081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11081: Component/s: Build > Shade Jersey dependency to work around the compatibility issue with Jersey2 > --- > > Key: SPARK-11081 > URL: https://issues.apache.org/jira/browse/SPARK-11081 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Reporter: Mingyu Kim > > As seen from this thread > (https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCALte62yD8H3=2KVMiFs7NZjn929oJ133JkPLrNEj=vrx-d2...@mail.gmail.com%3E), > Spark is incompatible with Jersey 2 especially when Spark is embedded in an > application running with Jersey. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.
[ https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11092: Assignee: Jakob Odersky > Add source URLs to API documentation. > - > > Key: SPARK-11092 > URL: https://issues.apache.org/jira/browse/SPARK-11092 > Project: Spark > Issue Type: Documentation > Components: Build, Documentation >Reporter: Jakob Odersky >Assignee: Jakob Odersky >Priority: Trivial > > It would be nice to have source URLs in the Spark scaladoc, similar to the > standard library (e.g. > http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). > The fix should be really simple, just adding a line to the sbt unidoc > settings. > I'll use the github repo url > bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH} > Feel free to tell me if I should use something else as base url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11115) Host verification is not correct for IPv6
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5: Summary: Host verification is not correct for IPv6 (was: IPv6 regression) > Host verification is not correct for IPv6 > - > > Key: SPARK-5 > URL: https://issues.apache.org/jira/browse/SPARK-5 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6 >Reporter: Thomas Dudziak >Priority: Critical > > When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error: > 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext. > java.lang.AssertionError: assertion failed: Expected hostname > at scala.Predef$.assert(Predef.scala:179) > at org.apache.spark.util.Utils$.checkHost(Utils.scala:805) > at > org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190) > at org.apache.spark.SparkContext.(SparkContext.scala:528) > at > org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) > Looking at the code in question, it seems that the code will only work for > IPv4 as it assumes ':' can't be part of the hostname (which it clearly can > for IPv6 addresses). > Instead, the code should probably use Guava's HostAndPort class, i.e.: > def checkHost(host: String, message: String = "") { > assert(!HostAndPort.fromString(host).hasPort, message) > } > def checkHostPort(hostPort: String, message: String = "") { > assert(HostAndPort.fromString(hostPort).hasPort, message) > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11115) IPv6 regression
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958078#comment-14958078 ] Patrick Wendell commented on SPARK-5: - The title of this says "Regression" - did it regression from a previous version? I am going to update the title, let me know if there is any issue. > IPv6 regression > --- > > Key: SPARK-5 > URL: https://issues.apache.org/jira/browse/SPARK-5 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: CentOS 6.7, Java 1.8.0_25, dual stack IPv4 + IPv6 >Reporter: Thomas Dudziak >Priority: Critical > > When running Spark with -Djava.net.preferIPv6Addresses=true, I get this error: > 15/10/14 14:36:01 ERROR SparkContext: Error initializing SparkContext. > java.lang.AssertionError: assertion failed: Expected hostname > at scala.Predef$.assert(Predef.scala:179) > at org.apache.spark.util.Utils$.checkHost(Utils.scala:805) > at > org.apache.spark.storage.BlockManagerId.(BlockManagerId.scala:48) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:107) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:190) > at org.apache.spark.SparkContext.(SparkContext.scala:528) > at > org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) > Looking at the code in question, it seems that the code will only work for > IPv4 as it assumes ':' can't be part of the hostname (which it clearly can > for IPv6 addresses). > Instead, the code should probably use Guava's HostAndPort class, i.e.: > def checkHost(host: String, message: String = "") { > assert(!HostAndPort.fromString(host).hasPort, message) > } > def checkHostPort(hostPort: String, message: String = "") { > assert(HostAndPort.fromString(hostPort).hasPort, message) > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor
[ https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11006: Component/s: SQL > Rename NullColumnAccess as NullColumnAccessor > - > > Key: SPARK-11006 > URL: https://issues.apache.org/jira/browse/SPARK-11006 > Project: Spark > Issue Type: Task > Components: SQL >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Trivial > Fix For: 1.6.0 > > > In sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala > , NullColumnAccess should be renmaed as NullColumnAccessor so that same > convention is adhered to for the accessors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11111) Fast null-safe join
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1: Component/s: SQL > Fast null-safe join > --- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > > Today, null safe joins are executed with a Cartesian product. > {code} > scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain > == Physical Plan == > TungstenProject [i#2,j#3,i#7,j#8] > Filter (i#2 <=> i#7) > CartesianProduct >LocalTableScan [i#2,j#3], [[1,1]] >LocalTableScan [i#7,j#8], [[1,1]] > {code} > One option is to add this rewrite to the optimizer: > {code} > select * > from t a > join t b > on coalesce(a.i, ) = coalesce(b.i, ) AND (a.i <=> b.i) > {code} > Acceptance criteria: joins with only null safe equality should not result in > a Cartesian product. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11056) Improve documentation on how to build Spark efficiently
[ https://issues.apache.org/jira/browse/SPARK-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-11056: Component/s: Documentation > Improve documentation on how to build Spark efficiently > --- > > Key: SPARK-11056 > URL: https://issues.apache.org/jira/browse/SPARK-11056 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 1.5.2, 1.6.0 > > > Slow build times are a common pain point for new Spark developers. We should > improve the main documentation on building Spark to describe how to make > building Spark less painful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6230) Provide authentication and encryption for Spark's RPC
[ https://issues.apache.org/jira/browse/SPARK-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954523#comment-14954523 ] Patrick Wendell commented on SPARK-6230: Should we update Spark's documentation to explain this? I think at present it only discusses encrypted RPC via akka. But this will be the new recommended way to encrypt RPC. > Provide authentication and encryption for Spark's RPC > - > > Key: SPARK-6230 > URL: https://issues.apache.org/jira/browse/SPARK-6230 > Project: Spark > Issue Type: Sub-task > Components: YARN >Reporter: Marcelo Vanzin > > Make sure the RPC layer used by Spark supports the auth and encryption > features of the network/common module. > This kinda ignores akka; adding support for SASL to akka, while possible, > seems to be at odds with the direction being taken in Spark, so let's > restrict this to the new RPC layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (FLINK-2699) Flink is filling Spark JIRA with incorrect PR links
Patrick Wendell created FLINK-2699: -- Summary: Flink is filling Spark JIRA with incorrect PR links Key: FLINK-2699 URL: https://issues.apache.org/jira/browse/FLINK-2699 Project: Flink Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker I think you guys are using our script for synchronizing JIRA. However, you didn't adjust the target JIRA identifier so it is still posting to Spark. In the past few hours we've seen a lot of random Flink pull requests being linked on the Spark JIRA. This is obviously not desirable for us since they are different projects. The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm I saw these as recently as 5 hours ago - but if you've fixed it already go ahead and close this. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2699) Flink is filling Spark JIRA with incorrect PR links
Patrick Wendell created FLINK-2699: -- Summary: Flink is filling Spark JIRA with incorrect PR links Key: FLINK-2699 URL: https://issues.apache.org/jira/browse/FLINK-2699 Project: Flink Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker I think you guys are using our script for synchronizing JIRA. However, you didn't adjust the target JIRA identifier so it is still posting to Spark. In the past few hours we've seen a lot of random Flink pull requests being linked on the Spark JIRA. This is obviously not desirable for us since they are different projects. The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm I saw these as recently as 5 hours ago - but if you've fixed it already go ahead and close this. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2699) Flink is filling Spark JIRA with incorrect PR links
[ https://issues.apache.org/jira/browse/FLINK-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated FLINK-2699: --- Description: I think you guys are using our script for synchronizing JIRA. However, you didn't adjust the target JIRA identifier so it is still posting to Spark. In the past few hours we've seen a lot of random Flink pull requests being linked on the Spark JIRA. This is obviously not desirable for us since they are different projects. The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm I saw these as recently as 5 hours ago. There are around 23 links that were created - if you could go ahead and remove them that would be useful. Thanks! was: I think you guys are using our script for synchronizing JIRA. However, you didn't adjust the target JIRA identifier so it is still posting to Spark. In the past few hours we've seen a lot of random Flink pull requests being linked on the Spark JIRA. This is obviously not desirable for us since they are different projects. The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm I saw these as recently as 5 hours ago - but if you've fixed it already go ahead and close this. Thanks. > Flink is filling Spark JIRA with incorrect PR links > --- > > Key: FLINK-2699 > URL: https://issues.apache.org/jira/browse/FLINK-2699 > Project: Flink > Issue Type: Bug >Reporter: Patrick Wendell >Priority: Blocker > > I think you guys are using our script for synchronizing JIRA. However, you > didn't adjust the target JIRA identifier so it is still posting to Spark. In > the past few hours we've seen a lot of random Flink pull requests being > linked on the Spark JIRA. This is obviously not desirable for us since they > are different projects. > The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm > I saw these as recently as 5 hours ago. There are around 23 links that were > created - if you could go ahead and remove them that would be useful. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2699) Flink is filling Spark JIRA with incorrect PR links
[ https://issues.apache.org/jira/browse/FLINK-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804497#comment-14804497 ] Patrick Wendell commented on FLINK-2699: Great - thanks for cleaning this up. No worries. > Flink is filling Spark JIRA with incorrect PR links > --- > > Key: FLINK-2699 > URL: https://issues.apache.org/jira/browse/FLINK-2699 > Project: Flink > Issue Type: Bug >Reporter: Patrick Wendell >Priority: Blocker > > I think you guys are using our script for synchronizing JIRA. However, you > didn't adjust the target JIRA identifier so it is still posting to Spark. In > the past few hours we've seen a lot of random Flink pull requests being > linked on the Spark JIRA. This is obviously not desirable for us since they > are different projects. > The JIRA links are being created by the user "Maximilian Michels" ([~mxm]). > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mxm > I saw these as recently as 5 hours ago. There are around 23 links that were > created - if you could go ahead and remove them that would be useful. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Description: In 1.5.0 there are some extra classes in the Spark docs - including a bunch of test classes. We need to figure out what commit introduced those and fix it. The obvious things like genJavadoc version have not changed. http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before] http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after] > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Priority: Critical (was: Major) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Affects Version/s: 1.5.0 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes
Patrick Wendell created SPARK-10650: --- Summary: Spark docs include test and other extra classes Key: SPARK-10650 URL: https://issues.apache.org/jira/browse/SPARK-10650 Project: Spark Issue Type: Bug Components: Documentation Reporter: Patrick Wendell Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Target Version/s: 1.5.1 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6942) Umbrella: UI Visualizations for Core and Dataframes
[ https://issues.apache.org/jira/browse/SPARK-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6942: --- Assignee: Andrew Or (was: Patrick Wendell) > Umbrella: UI Visualizations for Core and Dataframes > > > Key: SPARK-6942 > URL: https://issues.apache.org/jira/browse/SPARK-6942 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, SQL, Web UI >Reporter: Patrick Wendell >Assignee: Andrew Or > Fix For: 1.5.0 > > > This is an umbrella issue for the assorted visualization proposals for > Spark's UI. The scope will likely cover Spark 1.4 and 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics
Patrick Wendell created SPARK-10620: --- Summary: Look into whether accumulator mechanism can replace TaskMetrics Key: SPARK-10620 URL: https://issues.apache.org/jira/browse/SPARK-10620 Project: Spark Issue Type: Task Components: Spark Core Reporter: Patrick Wendell Assignee: Andrew Or This task is simply to explore whether the internal representation used by TaskMetrics could be performed by using accumulators rather than having two separate mechanisms. Note that we need to continue to preserve the existing "Task Metric" data structures that are exposed to users through event logs etc. The question is can we use a single internal codepath and perhaps make this easier to extend in the future. I think there are a few things to look into: - How do the semantics of accumulators on stage retries differ from aggregate TaskMetrics for a stage? Could we implement clearer retry semantics for internal accumulators to allow them to be the same - for instance, zeroing accumulator values if a stage is retried (see discussion here: SPARK-10042). - Are there metrics that do not fit well into the accumulator model, or would be difficult to update as an accumulator. - If we expose metrics through accumulators in the future rather than continuing to add fields to TaskMetrics, what is the best way to coerce compatibility? - Is it worth it to do this, or is the consolidation too complicated to justify? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10620: Description: This task is simply to explore whether the internal representation used by TaskMetrics could be performed by using accumulators rather than having two separate mechanisms. Note that we need to continue to preserve the existing "Task Metric" data structures that are exposed to users through event logs etc. The question is can we use a single internal codepath and perhaps make this easier to extend in the future. I think a full exploration would answer the following questions: - How do the semantics of accumulators on stage retries differ from aggregate TaskMetrics for a stage? Could we implement clearer retry semantics for internal accumulators to allow them to be the same - for instance, zeroing accumulator values if a stage is retried (see discussion here: SPARK-10042). - Are there metrics that do not fit well into the accumulator model, or would be difficult to update as an accumulator. - If we expose metrics through accumulators in the future rather than continuing to add fields to TaskMetrics, what is the best way to coerce compatibility? - Are there any other considerations? - Is it worth it to do this, or is the consolidation too complicated to justify? was: This task is simply to explore whether the internal representation used by TaskMetrics could be performed by using accumulators rather than having two separate mechanisms. Note that we need to continue to preserve the existing "Task Metric" data structures that are exposed to users through event logs etc. The question is can we use a single internal codepath and perhaps make this easier to extend in the future. I think there are a few things to look into: - How do the semantics of accumulators on stage retries differ from aggregate TaskMetrics for a stage? Could we implement clearer retry semantics for internal accumulators to allow them to be the same - for instance, zeroing accumulator values if a stage is retried (see discussion here: SPARK-10042). - Are there metrics that do not fit well into the accumulator model, or would be difficult to update as an accumulator. - If we expose metrics through accumulators in the future rather than continuing to add fields to TaskMetrics, what is the best way to coerce compatibility? - Is it worth it to do this, or is the consolidation too complicated to justify? > Look into whether accumulator mechanism can replace TaskMetrics > --- > > Key: SPARK-10620 > URL: https://issues.apache.org/jira/browse/SPARK-10620 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Andrew Or > > This task is simply to explore whether the internal representation used by > TaskMetrics could be performed by using accumulators rather than having two > separate mechanisms. Note that we need to continue to preserve the existing > "Task Metric" data structures that are exposed to users through event logs > etc. The question is can we use a single internal codepath and perhaps make > this easier to extend in the future. > I think a full exploration would answer the following questions: > - How do the semantics of accumulators on stage retries differ from aggregate > TaskMetrics for a stage? Could we implement clearer retry semantics for > internal accumulators to allow them to be the same - for instance, zeroing > accumulator values if a stage is retried (see discussion here: SPARK-10042). > - Are there metrics that do not fit well into the accumulator model, or would > be difficult to update as an accumulator. > - If we expose metrics through accumulators in the future rather than > continuing to add fields to TaskMetrics, what is the best way to coerce > compatibility? > - Are there any other considerations? > - Is it worth it to do this, or is the consolidation too complicated to > justify? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10620) Look into whether accumulator mechanism can replace TaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745690#comment-14745690 ] Patrick Wendell commented on SPARK-10620: - /cc [~imranr] and [~srowen] for any comments. In my mind the goal here is just to produce some design thoughts and not to actually do it (at this point). > Look into whether accumulator mechanism can replace TaskMetrics > --- > > Key: SPARK-10620 > URL: https://issues.apache.org/jira/browse/SPARK-10620 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Andrew Or > > This task is simply to explore whether the internal representation used by > TaskMetrics could be performed by using accumulators rather than having two > separate mechanisms. Note that we need to continue to preserve the existing > "Task Metric" data structures that are exposed to users through event logs > etc. The question is can we use a single internal codepath and perhaps make > this easier to extend in the future. > I think a full exploration would answer the following questions: > - How do the semantics of accumulators on stage retries differ from aggregate > TaskMetrics for a stage? Could we implement clearer retry semantics for > internal accumulators to allow them to be the same - for instance, zeroing > accumulator values if a stage is retried (see discussion here: SPARK-10042). > - Are there metrics that do not fit well into the accumulator model, or would > be difficult to update as an accumulator. > - If we expose metrics through accumulators in the future rather than > continuing to add fields to TaskMetrics, what is the best way to coerce > compatibility? > - Are there any other considerations? > - Is it worth it to do this, or is the consolidation too complicated to > justify? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10511) Source releases should not include maven jars
[ https://issues.apache.org/jira/browse/SPARK-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10511: Assignee: Luciano Resende > Source releases should not include maven jars > - > > Key: SPARK-10511 > URL: https://issues.apache.org/jira/browse/SPARK-10511 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Luciano Resende >Priority: Blocker > > I noticed our source jars seemed really big for 1.5.0. At least one > contributing factor is that, likely due to some change in the release script, > the maven jars are being bundled in with the source code in our build > directory. This runs afoul of the ASF policy on binaries in source releases - > we should fix it in 1.5.1. > The issue (I think) is that we might invoke maven to compute the version > between when we checkout Spark from github and when we package the source > file. I think it could be fixed by simply clearing out the build/ directory > after that statement runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10623: Component/s: SQL > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10601) Spark SQL - Support for MINUS
[ https://issues.apache.org/jira/browse/SPARK-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10601: Component/s: SQL > Spark SQL - Support for MINUS > - > > Key: SPARK-10601 > URL: https://issues.apache.org/jira/browse/SPARK-10601 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Richard Garris > > Spark SQL does not current supported SQL Minus -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10600) SparkSQL - Support for Not Exists in a Correlated Subquery
[ https://issues.apache.org/jira/browse/SPARK-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10600: Component/s: SQL > SparkSQL - Support for Not Exists in a Correlated Subquery > -- > > Key: SPARK-10600 > URL: https://issues.apache.org/jira/browse/SPARK-10600 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Richard Garris > > Spark SQL currently does not support NOT EXISTS clauses (e.g. > SELECT * FROM TABLE_A WHERE NOT EXISTS ( SELECT 1 FROM TABLE_B where > TABLE_B.id = TABLE_A.id) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10576) Move .java files out of src/main/scala
[ https://issues.apache.org/jira/browse/SPARK-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742280#comment-14742280 ] Patrick Wendell commented on SPARK-10576: - FWIW - seems to me like moving them into /java makes sense. If we are going to have src/main/scala and src/main/java, might as well use them correctly. What do you think [~rxin]. > Move .java files out of src/main/scala > -- > > Key: SPARK-10576 > URL: https://issues.apache.org/jira/browse/SPARK-10576 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.5.0 >Reporter: Sean Owen >Priority: Minor > > (I suppose I'm really asking for an opinion on this, rather than asserting it > must be done, but seems worthwhile. CC [~rxin] and [~pwendell]) > As pointed out on the mailing list, there are some Java files in the Scala > source tree: > {code} > ./bagel/src/main/scala/org/apache/spark/bagel/package-info.java > ./core/src/main/scala/org/apache/spark/annotation/AlphaComponent.java > ./core/src/main/scala/org/apache/spark/annotation/DeveloperApi.java > ./core/src/main/scala/org/apache/spark/annotation/Experimental.java > ./core/src/main/scala/org/apache/spark/annotation/package-info.java > ./core/src/main/scala/org/apache/spark/annotation/Private.java > ./core/src/main/scala/org/apache/spark/api/java/package-info.java > ./core/src/main/scala/org/apache/spark/broadcast/package-info.java > ./core/src/main/scala/org/apache/spark/executor/package-info.java > ./core/src/main/scala/org/apache/spark/io/package-info.java > ./core/src/main/scala/org/apache/spark/rdd/package-info.java > ./core/src/main/scala/org/apache/spark/scheduler/package-info.java > ./core/src/main/scala/org/apache/spark/serializer/package-info.java > ./core/src/main/scala/org/apache/spark/util/package-info.java > ./core/src/main/scala/org/apache/spark/util/random/package-info.java > ./external/flume/src/main/scala/org/apache/spark/streaming/flume/package-info.java > ./external/kafka/src/main/scala/org/apache/spark/streaming/kafka/package-info.java > ./external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/package-info.java > ./external/twitter/src/main/scala/org/apache/spark/streaming/twitter/package-info.java > ./external/zeromq/src/main/scala/org/apache/spark/streaming/zeromq/package-info.java > ./graphx/src/main/scala/org/apache/spark/graphx/impl/EdgeActiveness.java > ./graphx/src/main/scala/org/apache/spark/graphx/lib/package-info.java > ./graphx/src/main/scala/org/apache/spark/graphx/package-info.java > ./graphx/src/main/scala/org/apache/spark/graphx/TripletFields.java > ./graphx/src/main/scala/org/apache/spark/graphx/util/package-info.java > ./mllib/src/main/scala/org/apache/spark/ml/attribute/package-info.java > ./mllib/src/main/scala/org/apache/spark/ml/package-info.java > ./mllib/src/main/scala/org/apache/spark/mllib/package-info.java > ./sql/catalyst/src/main/scala/org/apache/spark/sql/types/SQLUserDefinedType.java > ./sql/hive/src/main/scala/org/apache/spark/sql/hive/package-info.java > ./streaming/src/main/scala/org/apache/spark/streaming/api/java/package-info.java > ./streaming/src/main/scala/org/apache/spark/streaming/dstream/package-info.java > ./streaming/src/main/scala/org/apache/spark/streaming/StreamingContextState.java > {code} > It happens to work since the Scala compiler plugin is handling both. > On its face, they should be in the Java source tree. I'm trying to figure out > if there are good reasons they have to be in this less intuitive location. > I might try moving them just to see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10511) Source releases should not include maven jars
Patrick Wendell created SPARK-10511: --- Summary: Source releases should not include maven jars Key: SPARK-10511 URL: https://issues.apache.org/jira/browse/SPARK-10511 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Patrick Wendell Priority: Blocker I noticed our source jars seemed really big for 1.5.0. At least one contributing factor is that, likely due to some change in the release script, the maven jars are being bundled in with the source code in our build directory. This runs afoul of the ASF policy on binaries in source releases - we should fix it in 1.5.1. The issue (I think) is that we might invoke maven to compute the version between when we checkout Spark from github and when we package the source file. I think it could be fixed by simply clearing out the build/ directory after that statement runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4
[ https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723792#comment-14723792 ] Patrick Wendell commented on SPARK-10374: - Hey Matt, I think the only thing that could have influenced you is that we changed our default advertised akka dependency. We used to advertise an older version of akka that shaded protobuf. What happens if you manually coerce that version of akka in your application? Spark itself doesn't directly use protobuf. But some of our dependencies do, including both akka and Hadoop. My guess is that you are now in a situation where you can't reconcile the akka and hadoop protobuf versions and make them both happy. This would be consistent with the changes we made in 1.5 in SPARK-7042. The fix would be to exclude all com.typsafe.akka artifacts from Spark and manually add org.spark-project.akka to your build. However, since you didn't post a full stack trace, I can't know for sure whether it is akka that complains when you try to fix the protobuf version at 2.4. > Spark-core 1.5.0-RC2 can create version conflicts with apps depending on > protobuf-2.4 > - > > Key: SPARK-10374 > URL: https://issues.apache.org/jira/browse/SPARK-10374 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Matt Cheah > > My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that > depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. > When I run the driver application, I can hit the following error: > {code} > … java.lang.UnsupportedOperationException: This is > supposed to be overridden by subclasses. > at > com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108) > at > com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149) > {code} > This application used to work when pulling in Spark 1.4.1 dependencies, and > thus this is a regression. > I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark > 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf > 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark > modules. It appears that Spark used to shade its protobuf dependencies and > hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However > when I ran dependencyInsight again against Spark 1.5 and it looks like > protobuf is no longer shaded from the Spark module. > 1.4.1 dependencyInsight: > {code} > com.google.protobuf:protobuf-java:2.4.0a > +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0 > |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 > | +--- compile > | \--- org.apache.spark:spark-core_2.10:1.4.1 > | +--- compile > | +--- org.apache.spark:spark-sql_2.10:1.4.1 > | |\--- compile > | \--- org.apache.spark:spark-catalyst_2.10:1.4.1 > | \--- org.apache.spark:spark-sql_2.10:1.4.1 (*) > \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0 > \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*) > org.spark-project.protobuf:protobuf-java:2.5.0-spark > \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark > \--- org.apache.spark:spark-core_2.10:1.4.1 > +--- compile > +--- org.apache.spark:spark-sql_2.10:1.4.1 > |\--- compile > \--- org.apache.spark:spark-catalyst_2.10:1.4.1 >\--- org.apache.spark:spark-sql_2.10:1.4.1 (*) > {code} > 1.5.0-rc2 dependencyInsight: > {code} > com.google.protobuf:protobuf-java:2.5.0 (conflict resolution) > \--- com.typesafe.akka:akka-remote_2.10:2.3.11 > \--- org.apache.spark:spark-core_2.10:1.5.0-rc2 > +--- compile > +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 > |\--- compile > \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2 >\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*) > com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0 > +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0 > |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 > | +--- compile > | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2 > | +--- compile > | +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 > | |\--- compile > | \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2 > | \---
[jira] [Commented] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests
[ https://issues.apache.org/jira/browse/SPARK-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723844#comment-14723844 ] Patrick Wendell commented on SPARK-10359: - The approach in SPARK-4123 was a bit different, but there is some overlap. We ended up reverting that patch because it wasn't working consistently. I'll close that one as a dup of this one. > Enumerate Spark's dependencies in a file and diff against it for new pull > requests > --- > > Key: SPARK-10359 > URL: https://issues.apache.org/jira/browse/SPARK-10359 > Project: Spark > Issue Type: New Feature > Components: Build >Reporter: Patrick Wendell >Assignee: Patrick Wendell > > Sometimes when we have dependency changes it can be pretty unclear what > transitive set of things are changing. If we enumerate all of the > dependencies and put them in a source file in the repo, we can make it so > that it is very explicit what is changing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4123) Show dependency changes in pull requests
[ https://issues.apache.org/jira/browse/SPARK-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4123. Resolution: Duplicate I've proposed a slightly different approach in SPARK-10359, so I'm closing this since there is high overlap. > Show dependency changes in pull requests > > > Key: SPARK-4123 > URL: https://issues.apache.org/jira/browse/SPARK-4123 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Patrick Wendell >Assignee: Brennon York >Priority: Critical > > We should inspect the classpath of Spark's assembly jar for every pull > request. This only takes a few seconds in Maven and it will help weed out > dependency changes from the master branch. Ideally we'd post any dependency > changes in the pull request message. > {code} > $ mvn -Phive -Phadoop-2.4 dependency:build-classpath -pl assembly | grep -v > INFO | tr : "\n" | awk -F/ '{print $NF}' | sort > my-classpath > $ git checkout apache/master > $ mvn -Phive -Phadoop-2.4 dependency:build-classpath -pl assembly | grep -v > INFO | tr : "\n" | awk -F/ '{print $NF}' | sort > master-classpath > $ diff my-classpath master-classpath > < chill-java-0.3.6.jar > < chill_2.10-0.3.6.jar > --- > > chill-java-0.5.0.jar > > chill_2.10-0.5.0.jar > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has [test-maven] in it
[ https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-9545: --- Summary: Run Maven tests in pull request builder if title has [test-maven] in it (was: Run Maven tests in pull request builder if title has [maven-test] in it) Run Maven tests in pull request builder if title has [test-maven] in it - Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.6.0 We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9547) Allow testing pull requests with different Hadoop versions
[ https://issues.apache.org/jira/browse/SPARK-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-9547. Resolution: Fixed Fix Version/s: 1.6.0 Allow testing pull requests with different Hadoop versions -- Key: SPARK-9547 URL: https://issues.apache.org/jira/browse/SPARK-9547 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.6.0 Similar to SPARK-9545 we should allow testing different Hadoop profiles in the PRB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it
[ https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-9545. Resolution: Fixed Fix Version/s: 1.6.0 Run Maven tests in pull request builder if title has [maven-test] in it - Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.6.0 We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests
Patrick Wendell created SPARK-10359: --- Summary: Enumerate Spark's dependencies in a file and diff against it for new pull requests Key: SPARK-10359 URL: https://issues.apache.org/jira/browse/SPARK-10359 Project: Spark Issue Type: New Feature Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Sometimes when we have dependency changes it can be pretty unclear what transitive set of things are changing. If we enumerate all of the dependencies and put them in a source file in the repo, we can make it so that it is very explicit what is changing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
[ https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680885#comment-14680885 ] Patrick Wendell commented on SPARK-7726: [~srowen] [~dragos] This is cropping up again when trying to create a release candidate for Spark 1.5: https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Release-All-Java7/26/console Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher] - Key: SPARK-7726 URL: https://issues.apache.org/jira/browse/SPARK-7726 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Priority: Blocker Fix For: 1.4.0 This one took a long time to track down. The Maven install phase is part of our release process. It runs the scala:doc target to generate doc jars. Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way that breaks our build. In both cases, it returned an error (there has been a long running error here that we've always ignored), however in 2.11.3 that error became fatal and failed the entire build process. The upgrade occurred in SPARK-7092. Here is a simple reproduction: {code} ./dev/change-version-to-2.11.sh mvn clean install -pl network/common -pl network/shuffle -DskipTests -Dscala-2.11 {code} This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 or higher. In either case an error is printed: {code} [INFO] [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } ^ model contains 22 documentable templates four errors found {code} Ideally we'd just dig in and fix this error. Unfortunately it's a very confusing error and I have no idea why it is appearing. I'd propose reverting SPARK-7092 in the mean time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660796#comment-14660796 ] Patrick Wendell commented on SPARK-1517: Hey Ryan, IIRC - the Apache snapshot repository won't let us publish binaries that do not have SNAPSHOT in the version number. The reason is it expects to see timestamped snapshots so its garbage collection mechanism can work. We could look at adding sha1 hashes, before SNAPSHOT, but I think there is some chance this would break their cleanup. In terms of posting more binaries - I can look at whether Databricks or Berkeley might be able to donate S3 resources for this, but it would have to be clearly maintained by those organizations and not branded as official Apache releases or anything like that. Publish nightly snapshots of documentation, maven artifacts, and binary builds -- Key: SPARK-1517 URL: https://issues.apache.org/jira/browse/SPARK-1517 Project: Spark Issue Type: Improvement Components: Build, Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical Should be pretty easy to do with Jenkins. The only thing I can think of that would be tricky is to set up credentials so that jenkins can publish this stuff somewhere on apache infra. Ideally we don't want to have to put a private key on every jenkins box (since they are otherwise pretty stateless). One idea is to encrypt these credentials with a passphrase and post them somewhere publicly visible. Then the jenkins build can download the credentials provided we set a passphrase in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660420#comment-14660420 ] Patrick Wendell commented on SPARK-1517: Hey Ryan, For the maven snapshot releases - unfortunately we are constrained by maven's own SNAPSHOT version format which doesn't allow encoding anything other than the timestamp. It's just not supported in their SNAPSHOT mechanism. However, one thing we could see is whether we can align the timestamp with the time of the actual spark commit, rather than the time of publication of the SNAPSHOT release. I'm not sure if maven lets you provide a custom timestamp when publishing. If we had that feature users could look at the Spark commit log and do some manual association. For the binaries, the reason why the same commit appears multiple times is that we do the build every four hours and always publish the latest one even if it's a duplicate. However, this could be modified pretty easily to just avoid double-publishing the same commit if there hasn't been any code change. Maybe create a JIRA for this? In terms of how many older versions are available, the scripts we use for this have a tunable retention window. Right now I'm only keeping the last 4 builds, we could probably extend it to something like 10 builds. However, at some point I'm likely to blow out of space in my ASF user account. Since the binaries are quite large, I don't think at least using ASF infrastructure it's feasible to keep all past builds. We have 3000 commits in a typical Spark release, and it's a few gigs for each binary build. Publish nightly snapshots of documentation, maven artifacts, and binary builds -- Key: SPARK-1517 URL: https://issues.apache.org/jira/browse/SPARK-1517 Project: Spark Issue Type: Improvement Components: Build, Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical Should be pretty easy to do with Jenkins. The only thing I can think of that would be tricky is to set up credentials so that jenkins can publish this stuff somewhere on apache infra. Ideally we don't want to have to put a private key on every jenkins box (since they are otherwise pretty stateless). One idea is to encrypt these credentials with a passphrase and post them somewhere publicly visible. Then the jenkins build can download the credentials provided we set a passphrase in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9547) Allow testing pull requests with different Hadoop versions
Patrick Wendell created SPARK-9547: -- Summary: Allow testing pull requests with different Hadoop versions Key: SPARK-9547 URL: https://issues.apache.org/jira/browse/SPARK-9547 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Similar to SPARK-9545 we should allow testing different Hadoop profiles in the PRB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it
[ https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-9545: --- Issue Type: Improvement (was: Bug) Run Maven tests in pull request builder if title has [maven-test] in it - Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it
Patrick Wendell created SPARK-9545: -- Summary: Run Maven tests in pull request builder if title has [maven-test] in it Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script
[ https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-9423. Resolution: Invalid Why do every other spark comiter keep suggesting to use spark-submit script --- Key: SPARK-9423 URL: https://issues.apache.org/jira/browse/SPARK-9423 Project: Spark Issue Type: Question Components: Deploy Affects Versions: 1.3.1 Reporter: nirav patel I see that on spark forum and stackoverflow people keep suggesting to use spark-submit.sh script as a way (only way) to launch spark jobs? Are we still living in application server monolithic world where I need to run startup.sh ? What if spark application is long running context that serves multiple requests? What if user just don't want to use script? They want to embed spark as a service in their application. Please STOP suggesting user to use spark-submit script as an alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script
[ https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645495#comment-14645495 ] Patrick Wendell commented on SPARK-9423: This is not a valid issue for JIRA (we use JIRA for project bugs and feature tracking). Please send an email to the spark-users list. Thanks. Why do every other spark comiter keep suggesting to use spark-submit script --- Key: SPARK-9423 URL: https://issues.apache.org/jira/browse/SPARK-9423 Project: Spark Issue Type: Question Components: Deploy Affects Versions: 1.3.1 Reporter: nirav patel I see that on spark forum and stackoverflow people keep suggesting to use spark-submit.sh script as a way (only way) to launch spark jobs? Are we still living in application server monolithic world where I need to run startup.sh ? What if spark application is long running context that serves multiple requests? What if user just don't want to use script? They want to embed spark as a service in their application. Please STOP suggesting user to use spark-submit script as an alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9304) Improve backwards compatibility of SPARK-8401
Patrick Wendell created SPARK-9304: -- Summary: Improve backwards compatibility of SPARK-8401 Key: SPARK-9304 URL: https://issues.apache.org/jira/browse/SPARK-9304 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Michael Allman Priority: Critical In SPARK-8401 a backwards incompatible change was made to the scala 2.11 build process. It would be good to add scripts with the older names to avoid breaking compatibility for harnesses or other automated builds that build for Scala 2.11. The can just be a one line shell script with a comment explaining it is for backwards compatibility purposes. /cc [~srowen] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector
[ https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8703: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-8521 Add CountVectorizer as a ml transformer to convert document to words count vector - Key: SPARK-8703 URL: https://issues.apache.org/jira/browse/SPARK-8703 Project: Spark Issue Type: Sub-task Components: ML Reporter: yuhao yang Assignee: yuhao yang Fix For: 1.5.0 Original Estimate: 24h Remaining Estimate: 24h Converts a text document to a sparse vector of token counts. Similar to http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html I can further add an estimator to extract vocabulary from corpus if that's appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8564) Add the Python API for Kinesis
[ https://issues.apache.org/jira/browse/SPARK-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8564: --- Target Version/s: 1.5.0 Add the Python API for Kinesis -- Key: SPARK-8564 URL: https://issues.apache.org/jira/browse/SPARK-8564 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Shixiong Zhu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
[ https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7920: --- Labels: (was: spark.tc) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example). Key: SPARK-7920 URL: https://issues.apache.org/jira/browse/SPARK-7920 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector. ChiSqSelector should just extend Serializable. Steps: 1. Locate the MLlib ChiSqSelector documentation example. 2. Fix the example by adding an import statement for ChiSqSelector. 3. Attempt to run - notice that it will fail due to ChiSqSelector not being serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions
[ https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8927: --- Labels: (was: spark.tc) Doc format wrong for some config descriptions - Key: SPARK-8927 URL: https://issues.apache.org/jira/browse/SPARK-8927 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.4.0 Reporter: Jon Alter Assignee: Jon Alter Priority: Trivial Fix For: 1.4.2, 1.5.0 In the docs, a couple descriptions of configuration (under Network) are not inside td/td and are being displayed immediately under the section title instead of in their row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.
[ https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7985: --- Labels: (was: spark.tc) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples. Key: SPARK-7985 URL: https://issues.apache.org/jira/browse/SPARK-7985 Project: Spark Issue Type: Bug Components: Documentation, ML Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 Update ML Doc's Estimator, Transformer, and Param Scala Java examples to use model.extractParamMap instead of model.fittingParamMap, which no longer exists. Remove all other references to fittingParamMap throughout Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column
[ https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7969: --- Labels: (was: spark.tc) Drop method on Dataframes should handle Column -- Key: SPARK-7969 URL: https://issues.apache.org/jira/browse/SPARK-7969 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 1.4.0 Reporter: Olivier Girardot Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.1, 1.5.0 For now the drop method available on Dataframe since Spark 1.4.0 only accepts a column name (as a string), it should also accept a Column as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7830) ML doc cleanup: logreg, classification link
[ https://issues.apache.org/jira/browse/SPARK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7830: --- Labels: (was: spark.tc) ML doc cleanup: logreg, classification link --- Key: SPARK-7830 URL: https://issues.apache.org/jira/browse/SPARK-7830 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Trivial Fix For: 1.4.0 Add logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, and fix related broken link. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides
[ https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8343: --- Labels: (was: spark.tc) Improve the Spark Streaming Guides -- Key: SPARK-8343 URL: https://issues.apache.org/jira/browse/SPARK-8343 Project: Spark Issue Type: Improvement Components: Documentation, Streaming Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.1, 1.5.0 Improve the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7977) Disallow println
[ https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7977: --- Labels: starter (was: spark.tc starter) Disallow println Key: SPARK-7977 URL: https://issues.apache.org/jira/browse/SPARK-7977 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Reynold Xin Assignee: Jon Alter Labels: starter Fix For: 1.5.0 Very often we see pull requests that added println from debugging, but the author forgot to remove it before code review. We can use the regex checker to disallow println. For legitimate use of println, we can then disable the rule where they are used. Add to scalastyle-config.xml file: {code} check customId=println level=error class=org.scalastyle.scalariform.TokenChecker enabled=true parametersparameter name=regex^println$/parameter/parameters customMessage![CDATA[Are you sure you want to println? If yes, wrap the code block with // scalastyle:off println println(...) // scalastyle:on println]]/customMessage /check {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8570) Improve MLlib Local Matrix Documentation.
[ https://issues.apache.org/jira/browse/SPARK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8570: --- Labels: (was: spark.tc) Improve MLlib Local Matrix Documentation. - Key: SPARK-8570 URL: https://issues.apache.org/jira/browse/SPARK-8570 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.5.0 Update the MLlib Data Types Local Matrix documentation as follows: -Include information on sparse matrices. -Add sparse matrix examples to the existing Scala and Java examples. -Add Python examples for both dense and sparse matrices (currently no Python examples exist for the Local Matrix section). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.
[ https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7883: --- Labels: (was: spark.tc) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation. --- Key: SPARK-7883 URL: https://issues.apache.org/jira/browse/SPARK-7883 Project: Spark Issue Type: Bug Components: Documentation, MLlib Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Trivial Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0 The trainImplicit Scala example near the end of the MLlib Collaborative Filtering documentation refers to an ALS.trainImplicit function signature that does not exist. Rather than add an extra function, let's just fix the example. Currently, the example refers to a function that would have the following signature: def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: Double) : MatrixFactorizationModel Instead, let's change the example to refer to this function, which does exist (notice the addition of the lambda parameter): def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: Double, alpha: Double) : MatrixFactorizationModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes
[ https://issues.apache.org/jira/browse/SPARK-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7426: --- Labels: (was: spark.tc) spark.ml AttributeFactory.fromStructField should allow other NumericTypes - Key: SPARK-7426 URL: https://issues.apache.org/jira/browse/SPARK-7426 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Assignee: Mike Dusenberry Priority: Minor Fix For: 1.5.0 It currently only supports DoubleType, but it should support others, at least for fromStructField (importing into ML attribute format, rather than exporting). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md
[ https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8639: --- Labels: (was: spark.tc) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md - Key: SPARK-8639 URL: https://issues.apache.org/jira/browse/SPARK-8639 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Rosstin Murphy Assignee: Rosstin Murphy Priority: Trivial Fix For: 1.4.1, 1.5.0 In docs/README.md, the text states around line 31 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. It might be more clear if we said Execute 'jekyll build' from the 'docs/' directory to compile the site. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. In docs/api.md: Here you can API docs for Spark and its submodules. should be something like: Here you can read API docs for Spark and its submodules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7357) Improving HBaseTest example
[ https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7357: --- Labels: (was: spark.tc) Improving HBaseTest example --- Key: SPARK-7357 URL: https://issues.apache.org/jira/browse/SPARK-7357 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Minor Fix For: 1.5.0 Original Estimate: 2m Remaining Estimate: 2m Minor improvement to HBaseTest example, when Hbase related configurations e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are not set to default (localhost:2181), connection to zookeeper might hang as shown in following stack 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxx.xxx.xxx:2181 sessionTimeout=9 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown error) 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to xxx.xxx.xxx/9.30.94.121:2181, initiating session 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, negotiated timeout = 4 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null this is due to hbase-site.xml is not placed on spark class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
[ https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8746: --- Labels: documentation test (was: documentation spark.tc test) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest) -- Key: SPARK-8746 URL: https://issues.apache.org/jira/browse/SPARK-8746 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Christian Kadner Assignee: Christian Kadner Priority: Trivial Labels: documentation, test Fix For: 1.4.1, 1.5.0 Original Estimate: 1h Remaining Estimate: 1h The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) describes how to generate golden answer files for new hive comparison test cases. However the download link for the Hive 0.13.1 jars points to https://hive.apache.org/downloads.html but none of the linked mirror sites still has the 0.13.1 version. We need to update the link to https://archive.apache.org/dist/hive/hive-0.13.1/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
[ https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6485: --- Labels: (was: spark.tc) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark -- Key: SPARK-6485 URL: https://issues.apache.org/jira/browse/SPARK-6485 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Xiangrui Meng We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark. Internally, we can use DataFrames for serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7744) Distributed matrix section in MLlib Data Types documentation should be reordered.
[ https://issues.apache.org/jira/browse/SPARK-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7744: --- Labels: (was: spark.tc) Distributed matrix section in MLlib Data Types documentation should be reordered. - Key: SPARK-7744 URL: https://issues.apache.org/jira/browse/SPARK-7744 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.3.2, 1.4.0 The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6785: --- Labels: (was: spark.tc) DateUtils can not handle date before 1970/01/01 correctly - Key: SPARK-6785 URL: https://issues.apache.org/jira/browse/SPARK-6785 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Christian Kadner Fix For: 1.5.0 {code} scala val d = new Date(100) d: java.sql.Date = 1969-12-31 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) res1: java.sql.Date = 1970-01-01 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5562) LDA should handle empty documents
[ https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5562: --- Labels: starter (was: spark.tc starter) LDA should handle empty documents - Key: SPARK-5562 URL: https://issues.apache.org/jira/browse/SPARK-5562 Project: Spark Issue Type: Test Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Alok Singh Priority: Minor Labels: starter Fix For: 1.5.0 Original Estimate: 96h Remaining Estimate: 96h Latent Dirichlet Allocation (LDA) could easily be given empty documents when people select a small vocabulary. We should check to make sure it is robust to empty documents. This will hopefully take the form of a unit test, but may require modifying the LDA implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support
[ https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7265: --- Labels: (was: spark.tc) Improving documentation for Spark SQL Hive support --- Key: SPARK-7265 URL: https://issues.apache.org/jira/browse/SPARK-7265 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Trivial Fix For: 1.5.0 miscellaneous documentation improvement for Spark SQL Hive support, Yarn cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs
[ https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2859: --- Labels: (was: spark.tc) Update url of Kryo project in related docs -- Key: SPARK-2859 URL: https://issues.apache.org/jira/browse/SPARK-2859 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Guancheng Chen Assignee: Guancheng Chen Priority: Trivial Fix For: 1.0.3, 1.1.0 Kryo project has been migrated from googlecode to github, hence we need to update its URL in related docs such as tuning.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1403) Spark on Mesos does not set Thread's context class loader
[ https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1403. Resolution: Fixed Target Version/s: (was: 1.5.0) Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible without the environment? Thanks! Spark on Mesos does not set Thread's context class loader - Key: SPARK-1403 URL: https://issues.apache.org/jira/browse/SPARK-1403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.3.0, 1.4.0 Environment: ubuntu 12.04 on vagrant Reporter: Bharath Bhushan Priority: Blocker Fix For: 1.0.0 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark executor on mesos slave throws a java.lang.ClassNotFoundException for org.apache.spark.serializer.JavaSerializer. The lengthy discussion is here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1403) Spark on Mesos does not set Thread's context class loader
[ https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625739#comment-14625739 ] Patrick Wendell edited comment on SPARK-1403 at 7/14/15 2:59 AM: - Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible about the environment? Thanks! was (Author: pwendell): Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible without the environment? Thanks! Spark on Mesos does not set Thread's context class loader - Key: SPARK-1403 URL: https://issues.apache.org/jira/browse/SPARK-1403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.3.0, 1.4.0 Environment: ubuntu 12.04 on vagrant Reporter: Bharath Bhushan Priority: Blocker Fix For: 1.0.0 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark executor on mesos slave throws a java.lang.ClassNotFoundException for org.apache.spark.serializer.JavaSerializer. The lengthy discussion is here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624086#comment-14624086 ] Patrick Wendell commented on SPARK-2089: Yeah - we can open it again later if someone who maintains this code is wanting to work on this feature. I just want to have this JIRA reflect the current status (i.e. for 5 versions there hasn't been any action in Spark) which is that it is not actively being fixed and make sure the documentation correctly reflects what we have now, to discourage the use of a feature that does not work. With YARN, preferredNodeLocalityData isn't honored --- Key: SPARK-2089 URL: https://issues.apache.org/jira/browse/SPARK-2089 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical When running in YARN cluster mode, apps can pass preferred locality data when constructing a Spark context that will dictate where to request executor containers. This is currently broken because of a race condition. The Spark-YARN code runs the user class and waits for it to start up a SparkContext. During its initialization, the SparkContext will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext and uses it to start requesting containers. But in the SparkContext constructor that takes the preferredNodeLocationData, setting preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN code comes around quickly enough after being notified, the data that's fetched is the empty unset version. The occurred during all of my runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org