[jira] [Created] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python
Reynold Xin created SPARK-9166: -- Summary: Hide JVM stack trace for IllegalArgumentException in Python Key: SPARK-9166 URL: https://issues.apache.org/jira/browse/SPARK-9166 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We now hide stack trace for AnalysisException. We should also hide it for IllegalArgumentException. See this ticket to see how to fix this problem: https://github.com/apache/spark/pull/7135 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7981) Improve DataFrame Python exception
[ https://issues.apache.org/jira/browse/SPARK-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-7981. -- Resolution: Duplicate Assignee: Davies Liu Fix Version/s: 1.5.0 Improve DataFrame Python exception -- Key: SPARK-7981 URL: https://issues.apache.org/jira/browse/SPARK-7981 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Fix For: 1.5.0 It would be great if most exceptions thrown are rethrown as Python exceptions, rather than some crazy Py4j exception with a long stacktrace that is not Python friendly. As an example {code} In [61]: df.stat.cov('id', 'uniform') --- Py4JJavaError Traceback (most recent call last) ipython-input-61-30146c89cbd6 in module() 1 df.stat.cov('id', 'uniform') /scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2) 1289 1290 def cov(self, col1, col2): - 1291 return self.df.cov(col1, col2) 1292 1293 cov.__doc__ = DataFrame.cov.__doc__ /scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2) 1139 if not isinstance(col2, str): 1140 raise ValueError(col2 should be a string.) - 1141 return self._jdf.stat().cov(col1, col2) 1142 1143 @since(1.4) /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc in __call__(self, *args) 535 answer = self.gateway_client.send_command(command) 536 return_value = get_return_value(answer, self.gateway_client, -- 537 self.target_id, self.name) 538 539 for temp_arg in temp_args: /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. -- 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o87.cov. : java.lang.IllegalArgumentException: requirement failed: Couldn't find column with name id at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:79) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:78) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.execution.stat.StatFunctions$.collectStatisticalData(StatFunctions.scala:78) at org.apache.spark.sql.execution.stat.StatFunctions$.calculateCov(StatFunctions.scala:100) at org.apache.spark.sql.DataFrameStatFunctions.cov(DataFrameStatFunctions.scala:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632292#comment-14632292 ] Apache Spark commented on SPARK-8240: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7486 string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Cheng Hao concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin reassigned SPARK-8240: -- Assignee: Reynold Xin (was: Cheng Hao) string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632294#comment-14632294 ] Reynold Xin commented on SPARK-8240: [~adrian-wang] I had some time tonight and wrote a version of this that has codegen and avoids conversion back and forth between String and UTF8String. string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9151) Implement code generation for Abs
Reynold Xin created SPARK-9151: -- Summary: Implement code generation for Abs Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9169) Improve unit test coverage for null expressions
[ https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632342#comment-14632342 ] Apache Spark commented on SPARK-9169: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7490 Improve unit test coverage for null expressions --- Key: SPARK-9169 URL: https://issues.apache.org/jira/browse/SPARK-9169 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9169) Improve unit test coverage for null expressions
Reynold Xin created SPARK-9169: -- Summary: Improve unit test coverage for null expressions Key: SPARK-9169 URL: https://issues.apache.org/jira/browse/SPARK-9169 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9169) Improve unit test coverage for null expressions
[ https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9169: --- Assignee: Apache Spark (was: Reynold Xin) Improve unit test coverage for null expressions --- Key: SPARK-9169 URL: https://issues.apache.org/jira/browse/SPARK-9169 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7218) Create a real iterator with open/close for Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-7218: --- Target Version/s: 1.6.0 Create a real iterator with open/close for Spark SQL Key: SPARK-7218 URL: https://issues.apache.org/jira/browse/SPARK-7218 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9150) Create a trait to track code generation for expressions
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632311#comment-14632311 ] Apache Spark commented on SPARK-9150: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7487 Create a trait to track code generation for expressions --- Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9150) Create a trait to track code generation for expressions
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9150: --- Summary: Create a trait to track code generation for expressions (was: Create a trait to track lack of code generation for expressions) Create a trait to track code generation for expressions --- Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9150) Create a trait to track lack of code generation for expressions
Reynold Xin created SPARK-9150: -- Summary: Create a trait to track lack of code generation for expressions Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4867) UDF clean up
[ https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632319#comment-14632319 ] Reynold Xin edited comment on SPARK-4867 at 7/18/15 7:37 AM: - I believe this is done - we are now doing type coercions for UDFs, and we no longer hack the parser for new UDFs. was (Author: rxin): I believe this is done - we are now doing type coercions for UDFs. UDF clean up Key: SPARK-4867 URL: https://issues.apache.org/jira/browse/SPARK-4867 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Reynold Xin Priority: Blocker Fix For: 1.5.0 Right now our support and internal implementation of many functions has a few issues. Specifically: - UDFS don't know their input types and thus don't do type coercion. - We hard code a bunch of built in functions into the parser. This is bad because in SQL it creates new reserved words for things that aren't actually keywords. Also it means that for each function we need to add support to both SQLContext and HiveContext separately. For this JIRA I propose we do the following: - Change the interfaces for registerFunction and ScalaUdf to include types for the input arguments as well as the output type. - Add a rule to analysis that does type coercion for UDFs. - Add a parse rule for functions to SQLParser. - Rewrite all the UDFs that are currently hacked into the various parsers using this new functionality. Depending on how big this refactoring becomes we could split parts 12 from part 3 above. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4867) UDF clean up
[ https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-4867. Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 1.5.0 I believe this is done - we are now doing type coercions for UDFs. UDF clean up Key: SPARK-4867 URL: https://issues.apache.org/jira/browse/SPARK-4867 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Reynold Xin Priority: Blocker Fix For: 1.5.0 Right now our support and internal implementation of many functions has a few issues. Specifically: - UDFS don't know their input types and thus don't do type coercion. - We hard code a bunch of built in functions into the parser. This is bad because in SQL it creates new reserved words for things that aren't actually keywords. Also it means that for each function we need to add support to both SQLContext and HiveContext separately. For this JIRA I propose we do the following: - Change the interfaces for registerFunction and ScalaUdf to include types for the input arguments as well as the output type. - Add a rule to analysis that does type coercion for UDFs. - Add a parse rule for functions to SQLParser. - Rewrite all the UDFs that are currently hacked into the various parsers using this new functionality. Depending on how big this refactoring becomes we could split parts 12 from part 3 above. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9019: --- Assignee: Apache Spark spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Assignee: Apache Spark Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632341#comment-14632341 ] Bolke de Bruin commented on SPARK-9019: --- I have created PR-#7489 for this issue. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Assigned] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9019: --- Assignee: (was: Apache Spark) spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73) at
[jira] [Issue Comment Deleted] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated SPARK-9019: -- Comment: was deleted (was: I have created PR-#7489 for this issue. ) spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Created] (SPARK-9161) Implement code generation for FormatNumber
Reynold Xin created SPARK-9161: -- Summary: Implement code generation for FormatNumber Key: SPARK-9161 URL: https://issues.apache.org/jira/browse/SPARK-9161 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9163) Implement code generation for Conv
Reynold Xin created SPARK-9163: -- Summary: Implement code generation for Conv Key: SPARK-9163 URL: https://issues.apache.org/jira/browse/SPARK-9163 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9159) Implement code generation for Ascii, Base64, and UnBase64
Reynold Xin created SPARK-9159: -- Summary: Implement code generation for Ascii, Base64, and UnBase64 Key: SPARK-9159 URL: https://issues.apache.org/jira/browse/SPARK-9159 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9158) PyLint should only fail on error
Davies Liu created SPARK-9158: - Summary: PyLint should only fail on error Key: SPARK-9158 URL: https://issues.apache.org/jira/browse/SPARK-9158 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Davies Liu Priority: Critical It's boring to fight with warning from Pylint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9160) Implement code generation for Encode and Decode
Reynold Xin created SPARK-9160: -- Summary: Implement code generation for Encode and Decode Key: SPARK-9160 URL: https://issues.apache.org/jira/browse/SPARK-9160 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5288) Stabilize Spark SQL data type API followup
[ https://issues.apache.org/jira/browse/SPARK-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-5288. Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 1.5.0 I think everything here has been done. Stabilize Spark SQL data type API followup --- Key: SPARK-5288 URL: https://issues.apache.org/jira/browse/SPARK-5288 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Assignee: Reynold Xin Fix For: 1.5.0 Several issues we need to address before release 1.3 * Do we want to make all classes in org.apache.spark.sql.types.dataTypes.scala public? Seems we do not need to make those abstract classes public. * Seems NativeType is not a very clear and useful concept. Should we just remove it? * We need to Stabilize the type hierarchy of our data types. Seems StringType and Decimal Type should not be primitive types. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8829) Improve expression performance
[ https://issues.apache.org/jira/browse/SPARK-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8829: --- Description: This is an umbrella ticket for various improvements to DataFrame and SQL expression performance. These expressions can be found in the org.apache.spark.sql.catalyst.expressions package. was: This is an umbrella ticket for various improvements to DataFrame and SQL expression performance. Improve expression performance -- Key: SPARK-8829 URL: https://issues.apache.org/jira/browse/SPARK-8829 Project: Spark Issue Type: Umbrella Components: SQL Reporter: Reynold Xin This is an umbrella ticket for various improvements to DataFrame and SQL expression performance. These expressions can be found in the org.apache.spark.sql.catalyst.expressions package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5295) Stabilize data types
[ https://issues.apache.org/jira/browse/SPARK-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-5295. Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 1.5.0 I believe the important things in this ticket has been done. We haven't explicitly define external/internal types yet. We should create a new ticket for that. Stabilize data types Key: SPARK-5295 URL: https://issues.apache.org/jira/browse/SPARK-5295 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 1. We expose all the stuff in data types right now, including NumericTypes, etc. These should be hidden from users. We should only expose the leaf types. 2. Remove DeveloperAPI tag from the common types. 3. Specify the internal type, external scala type, and external java type for each data type. 4. Add conversion functions between internal type, external scala type, and external java type into each type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9081) fillna/dropna should also fill/drop NaN values in addition to null values
[ https://issues.apache.org/jira/browse/SPARK-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9081: --- Priority: Blocker (was: Major) fillna/dropna should also fill/drop NaN values in addition to null values - Key: SPARK-9081 URL: https://issues.apache.org/jira/browse/SPARK-9081 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9167) call `millisToDays` in `stringToDate`
[ https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9167: --- Assignee: (was: Apache Spark) call `millisToDays` in `stringToDate` - Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9167) call `millisToDays` in `stringToDate`
[ https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632328#comment-14632328 ] Apache Spark commented on SPARK-9167: - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7488 call `millisToDays` in `stringToDate` - Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9167) call `millisToDays` in `stringToDate`
[ https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9167: --- Assignee: Apache Spark call `millisToDays` in `stringToDate` - Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9167) use UTC Calendar in `stringToDate`
[ https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-9167: --- Summary: use UTC Calendar in `stringToDate` (was: call `millisToDays` in `stringToDate`) use UTC Calendar in `stringToDate` -- Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9055) WidenTypes should also support Intersect and Except
[ https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9055: --- Assignee: Apache Spark WidenTypes should also support Intersect and Except --- Key: SPARK-9055 URL: https://issues.apache.org/jira/browse/SPARK-9055 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark HiveTypeCoercion.WidenTypes only supports Union right now. It should also support Intersect and Except. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9055) WidenTypes should also support Intersect and Except
[ https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632344#comment-14632344 ] Apache Spark commented on SPARK-9055: - User 'yijieshen' has created a pull request for this issue: https://github.com/apache/spark/pull/7491 WidenTypes should also support Intersect and Except --- Key: SPARK-9055 URL: https://issues.apache.org/jira/browse/SPARK-9055 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin HiveTypeCoercion.WidenTypes only supports Union right now. It should also support Intersect and Except. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9055) WidenTypes should also support Intersect and Except
[ https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9055: --- Assignee: (was: Apache Spark) WidenTypes should also support Intersect and Except --- Key: SPARK-9055 URL: https://issues.apache.org/jira/browse/SPARK-9055 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin HiveTypeCoercion.WidenTypes only supports Union right now. It should also support Intersect and Except. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9150) Create a trait to track code generation for expressions
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9150: --- Assignee: Reynold Xin (was: Apache Spark) Create a trait to track code generation for expressions --- Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9150) Create CodegenFallback and Unevaluable trait
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9150: --- Description: It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. Additionally, this patch creates an Unevaluable trait that can be used to track expressions that don't support evaluation (e.g. Star). was: It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. Create CodegenFallback and Unevaluable trait Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. Additionally, this patch creates an Unevaluable trait that can be used to track expressions that don't support evaluation (e.g. Star). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9150) Create a trait to track code generation for expressions
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9150: --- Assignee: Apache Spark (was: Reynold Xin) Create a trait to track code generation for expressions --- Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9150) Create CodegenFallback and Unevaluable trait
[ https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9150: --- Summary: Create CodegenFallback and Unevaluable trait (was: Create a trait to track code generation for expressions) Create CodegenFallback and Unevaluable trait Key: SPARK-9150 URL: https://issues.apache.org/jira/browse/SPARK-9150 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical It is very hard to track which expression supports code generation or not. This patch removes the default gencode implementation from Expression, and moves the default fallback implementation into a new trait called CodegenFallback. Each concrete expression needs to either implement code generation, or mix in CodegenFallback. This makes it very easy to track which expressions have code generation implemented already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9164) Implement code generation for Hex and Unhex
Reynold Xin created SPARK-9164: -- Summary: Implement code generation for Hex and Unhex Key: SPARK-9164 URL: https://issues.apache.org/jira/browse/SPARK-9164 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9162) Implement code generation for ScalaUDF
Reynold Xin created SPARK-9162: -- Summary: Implement code generation for ScalaUDF Key: SPARK-9162 URL: https://issues.apache.org/jira/browse/SPARK-9162 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9165) Implement code generation for CreateArray, CreateStruct, and CreateNamedStruct
Reynold Xin created SPARK-9165: -- Summary: Implement code generation for CreateArray, CreateStruct, and CreateNamedStruct Key: SPARK-9165 URL: https://issues.apache.org/jira/browse/SPARK-9165 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9167) call `millisToDays` in `stringToDate`
Wenchen Fan created SPARK-9167: -- Summary: call `millisToDays` in `stringToDate` Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9156) Implement code generation for StringSplit
Reynold Xin created SPARK-9156: -- Summary: Implement code generation for StringSplit Key: SPARK-9156 URL: https://issues.apache.org/jira/browse/SPARK-9156 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9153) Implement code generation for StringLPad and StringRPad
Reynold Xin created SPARK-9153: -- Summary: Implement code generation for StringLPad and StringRPad Key: SPARK-9153 URL: https://issues.apache.org/jira/browse/SPARK-9153 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9152) Implement code generation for Like and RLike
Reynold Xin created SPARK-9152: -- Summary: Implement code generation for Like and RLike Key: SPARK-9152 URL: https://issues.apache.org/jira/browse/SPARK-9152 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9154) Implement code generation for StringFormat
Reynold Xin created SPARK-9154: -- Summary: Implement code generation for StringFormat Key: SPARK-9154 URL: https://issues.apache.org/jira/browse/SPARK-9154 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9155) Implement code generation for StringSpace
Reynold Xin created SPARK-9155: -- Summary: Implement code generation for StringSpace Key: SPARK-9155 URL: https://issues.apache.org/jira/browse/SPARK-9155 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9168) Add nanvl expression
Reynold Xin created SPARK-9168: -- Summary: Add nanvl expression Key: SPARK-9168 URL: https://issues.apache.org/jira/browse/SPARK-9168 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Similar to Oracle's nanvl: nanvl(v1, v2) if v1 is NaN, returns v2; otherwise, returns v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632340#comment-14632340 ] Apache Spark commented on SPARK-9019: - User 'bolkedebruin' has created a pull request for this issue: https://github.com/apache/spark/pull/7489 spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Assigned] (SPARK-9169) Improve unit test coverage for null expressions
[ https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9169: --- Assignee: Reynold Xin (was: Apache Spark) Improve unit test coverage for null expressions --- Key: SPARK-9169 URL: https://issues.apache.org/jira/browse/SPARK-9169 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9170) ORC data source creates a schema with lowercase table names
Peter Rudenko created SPARK-9170: Summary: ORC data source creates a schema with lowercase table names Key: SPARK-9170 URL: https://issues.apache.org/jira/browse/SPARK-9170 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Peter Rudenko Steps to reproduce: {code} sqlContext.range(0, 10).select('id as Acol).write.format(orc).save(/tmp/foo) sqlContext.read.format(orc).load(/tmp/foo).schema(Acol) //java.lang.IllegalArgumentException: Field Acol does not exist. sqlContext.read.format(orc).load(/tmp/foo).schema(acol) //org.apache.spark.sql.types.StructField = StructField(acol,LongType,true) sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show() //++ |Acol| ++ | 1| | 5| | 3| | 4| | 7| | 2| | 6| | 8| | 9| | 0| ++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9170) ORC data source creates a schema with lowercase table names
[ https://issues.apache.org/jira/browse/SPARK-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Rudenko updated SPARK-9170: - Description: Steps to reproduce: {code} sqlContext.range(0, 10).select('id as Acol).write.format(orc).save(/tmp/foo) sqlContext.read.format(orc).load(/tmp/foo).schema(Acol) //java.lang.IllegalArgumentException: Field Acol does not exist. sqlContext.read.format(orc).load(/tmp/foo).schema(acol) //org.apache.spark.sql.types.StructField = StructField(acol,LongType,true) sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show() //++ |Acol| ++ | 1| | 5| | 3| | 4| | 7| | 2| | 6| | 8| | 9| | 0| ++ {code} was: Steps to reproduce: {code} sqlContext.range(0, 10).select('id as Acol).write.format(orc).save(/tmp/foo) sqlContext.read.format(orc).load(/tmp/foo).schema(Acol) //java.lang.IllegalArgumentException: Field Acol does not exist. sqlContext.read.format(orc).load(/tmp/foo).schema(acol) //org.apache.spark.sql.types.StructField = StructField(acol,LongType,true) sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show() //++ |Acol| ++ | 1| | 5| | 3| | 4| | 7| | 2| | 6| | 8| | 9| | 0| ++ {code} ORC data source creates a schema with lowercase table names --- Key: SPARK-9170 URL: https://issues.apache.org/jira/browse/SPARK-9170 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Peter Rudenko Steps to reproduce: {code} sqlContext.range(0, 10).select('id as Acol).write.format(orc).save(/tmp/foo) sqlContext.read.format(orc).load(/tmp/foo).schema(Acol) //java.lang.IllegalArgumentException: Field Acol does not exist. sqlContext.read.format(orc).load(/tmp/foo).schema(acol) //org.apache.spark.sql.types.StructField = StructField(acol,LongType,true) sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show() //++ |Acol| ++ | 1| | 5| | 3| | 4| | 7| | 2| | 6| | 8| | 9| | 0| ++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632476#comment-14632476 ] Cody Koeninger commented on SPARK-9059: --- How is this different from SPARK-8390 ? I thought the idea there was to keep word count simple, and have separate examples for offsets. Restarting from specific offsets is a good idea, but requires storage. If we want to move the examples from my external repo https://github.com/koeninger/kafka-exactly-once into spark, it would probably require switching from postgres to an in memory database. Update Direct Kafka Word count examples to show the use of HasOffsetRanges -- Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Update Scala, Java and Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9094: --- Assignee: (was: Apache Spark) Increase io.dropwizard.metrics dependency to 3.1.2 -- Key: SPARK-9094 URL: https://issues.apache.org/jira/browse/SPARK-9094 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Carl Anders Düvel Priority: Minor This change is described in pull request: https://github.com/apache/spark/pull/7422 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Anders Düvel updated SPARK-9094: - Description: This change is described in pull request: https://github.com/apache/spark/pull/7493 was: This change is described in pull request: https://github.com/apache/spark/pull/7422 Increase io.dropwizard.metrics dependency to 3.1.2 -- Key: SPARK-9094 URL: https://issues.apache.org/jira/browse/SPARK-9094 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Carl Anders Düvel Priority: Minor This change is described in pull request: https://github.com/apache/spark/pull/7493 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9094: --- Assignee: Apache Spark Increase io.dropwizard.metrics dependency to 3.1.2 -- Key: SPARK-9094 URL: https://issues.apache.org/jira/browse/SPARK-9094 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Carl Anders Düvel Assignee: Apache Spark Priority: Minor This change is described in pull request: https://github.com/apache/spark/pull/7422 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632377#comment-14632377 ] Apache Spark commented on SPARK-9094: - User 'hackbert' has created a pull request for this issue: https://github.com/apache/spark/pull/7493 Increase io.dropwizard.metrics dependency to 3.1.2 -- Key: SPARK-9094 URL: https://issues.apache.org/jira/browse/SPARK-9094 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Carl Anders Düvel Priority: Minor This change is described in pull request: https://github.com/apache/spark/pull/7422 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9171) add and improve tests for nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632425#comment-14632425 ] Apache Spark commented on SPARK-9171: - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7496 add and improve tests for nondeterministic expressions -- Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9171) add and improve tests for nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9171: --- Assignee: (was: Apache Spark) add and improve tests for nondeterministic expressions -- Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2
[ https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Anders Düvel updated SPARK-9094: - External issue URL: https://github.com/apache/spark/pull/7493 (was: https://github.com/apache/spark/pull/7422) Increase io.dropwizard.metrics dependency to 3.1.2 -- Key: SPARK-9094 URL: https://issues.apache.org/jira/browse/SPARK-9094 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Carl Anders Düvel Priority: Minor This change is described in pull request: https://github.com/apache/spark/pull/7493 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning
[ https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632373#comment-14632373 ] Apache Spark commented on SPARK-6910: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/7492 Support for pushing predicates down to metastore for partition pruning -- Key: SPARK-6910 URL: https://issues.apache.org/jira/browse/SPARK-6910 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Cheolsoo Park Priority: Critical Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8951) support CJK characters in collect()
[ https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8951: --- Assignee: Apache Spark support CJK characters in collect() --- Key: SPARK-8951 URL: https://issues.apache.org/jira/browse/SPARK-8951 Project: Spark Issue Type: Bug Components: SparkR Reporter: Jaehong Choi Assignee: Apache Spark Priority: Minor Attachments: SerDe.scala.diff Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I found out that SerDe in R API only supports ASCII format for strings right now as commented in source code. So, I fixed SerDe.scala a little to support CJK as the file attached. I did not care efficiency, but just wanted to see if it works. {noformat} people.json {name:가나} {name:테스트123, age:30} {name:Justin, age:19} df - read.df(sqlContext, ./people.json, json) head(df) Error in rawtochar(string) : embedded nul in string : '\0 \x98' {noformat} {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala} // NOTE: Only works for ASCII right now def writeString(out: DataOutputStream, value: String): Unit = { val len = value.length out.writeInt(len + 1) // For the \0 out.writeBytes(value) out.writeByte(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8951) support CJK characters in collect()
[ https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632408#comment-14632408 ] Apache Spark commented on SPARK-8951: - User 'CHOIJAEHONG1' has created a pull request for this issue: https://github.com/apache/spark/pull/7494 support CJK characters in collect() --- Key: SPARK-8951 URL: https://issues.apache.org/jira/browse/SPARK-8951 Project: Spark Issue Type: Bug Components: SparkR Reporter: Jaehong Choi Priority: Minor Attachments: SerDe.scala.diff Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I found out that SerDe in R API only supports ASCII format for strings right now as commented in source code. So, I fixed SerDe.scala a little to support CJK as the file attached. I did not care efficiency, but just wanted to see if it works. {noformat} people.json {name:가나} {name:테스트123, age:30} {name:Justin, age:19} df - read.df(sqlContext, ./people.json, json) head(df) Error in rawtochar(string) : embedded nul in string : '\0 \x98' {noformat} {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala} // NOTE: Only works for ASCII right now def writeString(out: DataOutputStream, value: String): Unit = { val len = value.length out.writeInt(len + 1) // For the \0 out.writeBytes(value) out.writeByte(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8951) support CJK characters in collect()
[ https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8951: --- Assignee: (was: Apache Spark) support CJK characters in collect() --- Key: SPARK-8951 URL: https://issues.apache.org/jira/browse/SPARK-8951 Project: Spark Issue Type: Bug Components: SparkR Reporter: Jaehong Choi Priority: Minor Attachments: SerDe.scala.diff Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I found out that SerDe in R API only supports ASCII format for strings right now as commented in source code. So, I fixed SerDe.scala a little to support CJK as the file attached. I did not care efficiency, but just wanted to see if it works. {noformat} people.json {name:가나} {name:테스트123, age:30} {name:Justin, age:19} df - read.df(sqlContext, ./people.json, json) head(df) Error in rawtochar(string) : embedded nul in string : '\0 \x98' {noformat} {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala} // NOTE: Only works for ASCII right now def writeString(out: DataOutputStream, value: String): Unit = { val len = value.length out.writeInt(len + 1) // For the \0 out.writeBytes(value) out.writeByte(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9171) add and improve tests for nondeterministic expressions
Wenchen Fan created SPARK-9171: -- Summary: add and improve tests for nondeterministic expressions Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9171) add and improve tests for nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9171: --- Assignee: Apache Spark add and improve tests for nondeterministic expressions -- Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Assignee: Apache Spark Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9081) fillna/dropna should also fill/drop NaN values in addition to null values
[ https://issues.apache.org/jira/browse/SPARK-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632469#comment-14632469 ] Yijie Shen commented on SPARK-9081: --- OK, I'll take this fillna/dropna should also fill/drop NaN values in addition to null values - Key: SPARK-9081 URL: https://issues.apache.org/jira/browse/SPARK-9081 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9142) Removing unnecessary self types in Catalyst
[ https://issues.apache.org/jira/browse/SPARK-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632422#comment-14632422 ] Apache Spark commented on SPARK-9142: - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7495 Removing unnecessary self types in Catalyst --- Key: SPARK-9142 URL: https://issues.apache.org/jira/browse/SPARK-9142 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 A small change, based on code review and offline discussion with [~dragos]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632528#comment-14632528 ] Sean Owen commented on SPARK-9059: -- Yeah closing this as a duplicate unless someone can update the title to differentiate this from pdate DirectKafkaWordCount examples to show how offset ranges can be used and https://github.com/apache/spark/pull/6863/files which was committed. Update Direct Kafka Word count examples to show the use of HasOffsetRanges -- Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Update Scala, Java and Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8390) Update DirectKafkaWordCount examples to show how offset ranges can be used
[ https://issues.apache.org/jira/browse/SPARK-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-8390. -- Resolution: Fixed Fix Version/s: 1.5.0 1.4.2 Target Version/s: (was: 1.5.0) Update DirectKafkaWordCount examples to show how offset ranges can be used -- Key: SPARK-8390 URL: https://issues.apache.org/jira/browse/SPARK-8390 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.4.0 Reporter: Tathagata Das Assignee: Cody Koeninger Fix For: 1.4.2, 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9151) Implement code generation for Abs
[ https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9151: --- Assignee: (was: Apache Spark) Implement code generation for Abs - Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9151) Implement code generation for Abs
[ https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9151: --- Assignee: Apache Spark Implement code generation for Abs - Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9151) Implement code generation for Abs
[ https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9151: --- Assignee: Liang-Chi Hsieh Implement code generation for Abs - Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Liang-Chi Hsieh -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8159) Improve expression function coverage
[ https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8159: --- Summary: Improve expression function coverage (was: Improve SQL/DataFrame expression coverage) Improve expression function coverage Key: SPARK-8159 URL: https://issues.apache.org/jira/browse/SPARK-8159 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin This is an umbrella ticket to track new expressions we are adding to SQL/DataFrame. For each new expression, we should: 1. Add a new Expression implementation in org.apache.spark.sql.catalyst.expressions 2. If applicable, implement the code generated version (by implementing genCode). 3. Add comprehensive unit tests (for all the data types the expressions support). 4. If applicable, add a new function for DataFrame in org.apache.spark.sql.functions, and python/pyspark/sql/functions.py for Python. For date/time functions, put them in expressions/datetime.scala, and create a DateTimeFunctionSuite.scala for testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9171) add and improve tests for nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9171: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-8159 add and improve tests for nondeterministic expressions -- Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan Priority: Trivial Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9151) Implement code generation for Abs
[ https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9151. Resolution: Fixed Fix Version/s: 1.5.0 Implement code generation for Abs - Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Liang-Chi Hsieh Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9101) Can't use null in selectExpr
[ https://issues.apache.org/jira/browse/SPARK-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9101: --- Assignee: Apache Spark Can't use null in selectExpr Key: SPARK-9101 URL: https://issues.apache.org/jira/browse/SPARK-9101 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Mateusz Buśkiewicz Assignee: Apache Spark In 1.3.1 this worked: {code:python} df = sqlContext.createDataFrame([[1]], schema=['col']) df.selectExpr('null as newCol').collect() {code} In 1.4.0 it fails with the following stacktrace: {code} Traceback (most recent call last): File input, line 1, in module File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py, line 316, in collect cls = _create_cls(self.schema) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py, line 229, in schema self._schema = _parse_datatype_json_string(self._jdf.schema().json()) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 519, in _parse_datatype_json_string return _parse_datatype_json_value(json.loads(json_string)) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 539, in _parse_datatype_json_value return _all_complex_types[tpe].fromJson(json_value) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 386, in fromJson return StructType([StructField.fromJson(f) for f in json[fields]]) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 347, in fromJson _parse_datatype_json_value(json[type]), File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 535, in _parse_datatype_json_value raise ValueError(Could not parse datatype: %s % json_value) ValueError: Could not parse datatype: null {code} https://github.com/apache/spark/blob/v1.4.0/python/pyspark/sql/types.py#L461 The cause:_atomic_types doesn't contain NullType -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9101) Can't use null in selectExpr
[ https://issues.apache.org/jira/browse/SPARK-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632571#comment-14632571 ] Apache Spark commented on SPARK-9101: - User 'sixers' has created a pull request for this issue: https://github.com/apache/spark/pull/7499 Can't use null in selectExpr Key: SPARK-9101 URL: https://issues.apache.org/jira/browse/SPARK-9101 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0, 1.4.1 Reporter: Mateusz Buśkiewicz In 1.3.1 this worked: {code:python} df = sqlContext.createDataFrame([[1]], schema=['col']) df.selectExpr('null as newCol').collect() {code} In 1.4.0 it fails with the following stacktrace: {code} Traceback (most recent call last): File input, line 1, in module File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py, line 316, in collect cls = _create_cls(self.schema) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py, line 229, in schema self._schema = _parse_datatype_json_string(self._jdf.schema().json()) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 519, in _parse_datatype_json_string return _parse_datatype_json_value(json.loads(json_string)) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 539, in _parse_datatype_json_value return _all_complex_types[tpe].fromJson(json_value) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 386, in fromJson return StructType([StructField.fromJson(f) for f in json[fields]]) File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 347, in fromJson _parse_datatype_json_value(json[type]), File /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, line 535, in _parse_datatype_json_value raise ValueError(Could not parse datatype: %s % json_value) ValueError: Could not parse datatype: null {code} https://github.com/apache/spark/blob/v1.4.0/python/pyspark/sql/types.py#L461 The cause:_atomic_types doesn't contain NullType -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-8240. Resolution: Fixed Fix Version/s: 1.5.0 string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9109) Unpersist a graph object does not work properly
[ https://issues.apache.org/jira/browse/SPARK-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien-Dung LE closed SPARK-9109. --- Resolution: Fixed The change has been merged at https://github.com/apache/spark/pull/7469 Unpersist a graph object does not work properly --- Key: SPARK-9109 URL: https://issues.apache.org/jira/browse/SPARK-9109 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.3.1, 1.4.0 Reporter: Tien-Dung LE Priority: Minor Unpersist a graph object does not work properly. Here is the code to produce {code} import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import org.slf4j.LoggerFactory import org.apache.spark.graphx.util.GraphGenerators val graph: Graph[Long, Long] = GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) = id.toLong ).mapEdges( e = e.attr.toLong) graph.cache().numEdges graph.unpersist() {code} There should not be any cached RDDs in storage (http://localhost:4040/storage/). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7961) Redesign SQLConf for better error message reporting
[ https://issues.apache.org/jira/browse/SPARK-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632593#comment-14632593 ] Reynold Xin commented on SPARK-7961: [~zsxwing] I tried this today, and it looks like it doesn't work in bin/spark-sql using set -v. Can you take a look? Redesign SQLConf for better error message reporting --- Key: SPARK-7961 URL: https://issues.apache.org/jira/browse/SPARK-7961 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Shixiong Zhu Priority: Critical Fix For: 1.5.0 Right now, we don't validate config values and as a result will throw exceptions when queries or DataFrame operations are run. Imagine if one user sets config variable spark.sql.retainGroupColumns (requires true, false) to hello. The set action itself will complete fine. When another user runs a query, it will throw the following exception: {code} java.lang.IllegalArgumentException: For input string: hello at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:238) at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:226) at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:31) at org.apache.spark.sql.SQLConf.dataFrameRetainGroupColumns(SQLConf.scala:265) at org.apache.spark.sql.GroupedData.toDF(GroupedData.scala:74) at org.apache.spark.sql.GroupedData.agg(GroupedData.scala:227) {code} This is highly confusing. We should redesign SQLConf to validate data input at set time (during setConf call). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9174) Add documentation for all public SQLConfs
Reynold Xin created SPARK-9174: -- Summary: Add documentation for all public SQLConfs Key: SPARK-9174 URL: https://issues.apache.org/jira/browse/SPARK-9174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9174) Add documentation for all public SQLConfs
[ https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9174. Resolution: Fixed Fix Version/s: 1.5.0 Add documentation for all public SQLConfs - Key: SPARK-9174 URL: https://issues.apache.org/jira/browse/SPARK-9174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9055) WidenTypes should also support Intersect and Except in addition to Union
[ https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9055. Resolution: Fixed Assignee: Yijie Shen Fix Version/s: 1.5.0 WidenTypes should also support Intersect and Except in addition to Union Key: SPARK-9055 URL: https://issues.apache.org/jira/browse/SPARK-9055 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Yijie Shen Fix For: 1.5.0 HiveTypeCoercion.WidenTypes only supports Union right now. It should also support Intersect and Except. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9174) Add documentation for all public SQLConfs
[ https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632595#comment-14632595 ] Apache Spark commented on SPARK-9174: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7500 Add documentation for all public SQLConfs - Key: SPARK-9174 URL: https://issues.apache.org/jira/browse/SPARK-9174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9174) Add documentation for all public SQLConfs
[ https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9174: --- Assignee: Apache Spark (was: Reynold Xin) Add documentation for all public SQLConfs - Key: SPARK-9174 URL: https://issues.apache.org/jira/browse/SPARK-9174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9174) Add documentation for all public SQLConfs
[ https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9174: --- Assignee: Reynold Xin (was: Apache Spark) Add documentation for all public SQLConfs - Key: SPARK-9174 URL: https://issues.apache.org/jira/browse/SPARK-9174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8278) Remove deprecated JsonRDD functionality in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632596#comment-14632596 ] Apache Spark commented on SPARK-8278: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/7501 Remove deprecated JsonRDD functionality in Spark SQL Key: SPARK-8278 URL: https://issues.apache.org/jira/browse/SPARK-8278 Project: Spark Issue Type: Story Components: SQL Reporter: Nathan Howell Priority: Critical The old JSON functionality (deprecated in 1.4) needs to be removed for 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6793) Implement perplexity for LDA
[ https://issues.apache.org/jira/browse/SPARK-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632598#comment-14632598 ] Feynman Liang commented on SPARK-6793: -- I'm working on this. Implement perplexity for LDA Key: SPARK-6793 URL: https://issues.apache.org/jira/browse/SPARK-6793 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Original Estimate: 168h Remaining Estimate: 168h LDA should be able to compute perplexity. This JIRA is for computing it on the training dataset. See the linked JIRA for computing it on a new corpus: [SPARK-5567] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5567) Add prediction methods to LDA
[ https://issues.apache.org/jira/browse/SPARK-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632597#comment-14632597 ] Feynman Liang commented on SPARK-5567: -- I'm working on this. Add prediction methods to LDA - Key: SPARK-5567 URL: https://issues.apache.org/jira/browse/SPARK-5567 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Original Estimate: 168h Remaining Estimate: 168h LDA currently supports prediction on the training set. E.g., you can call logLikelihood and topicDistributions to get that info for the training data. However, it should support the same functionality for new (test) documents. This will require inference but should be able to use the same code, with a few modification to keep the inferred topics fixed. Note: The API for these methods is already in the code but is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7690) MulticlassClassificationEvaluator for tuning Multiclass Classifiers
[ https://issues.apache.org/jira/browse/SPARK-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7690: - Shepherd: Joseph K. Bradley (was: Ram Sriharsha) MulticlassClassificationEvaluator for tuning Multiclass Classifiers --- Key: SPARK-7690 URL: https://issues.apache.org/jira/browse/SPARK-7690 Project: Spark Issue Type: Improvement Components: ML Reporter: Ram Sriharsha Assignee: Eron Wright Provide a MulticlassClassificationEvaluator with weighted F1-score to tune multiclass classifiers using Pipeline API. MLLib already provides a MulticlassMetrics functionality which can be wrapped around a MulticlassClassificationEvaluator to expose weighted F1-score as metric. The functionality could be similar to scikit(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) in that we can support micro, macro and weighted versions of the F1-score (with weighted being default) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7690) MulticlassClassificationEvaluator for tuning Multiclass Classifiers
[ https://issues.apache.org/jira/browse/SPARK-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7690: - Issue Type: New Feature (was: Improvement) MulticlassClassificationEvaluator for tuning Multiclass Classifiers --- Key: SPARK-7690 URL: https://issues.apache.org/jira/browse/SPARK-7690 Project: Spark Issue Type: New Feature Components: ML Reporter: Ram Sriharsha Assignee: Ram Sriharsha Provide a MulticlassClassificationEvaluator with weighted F1-score to tune multiclass classifiers using Pipeline API. MLLib already provides a MulticlassMetrics functionality which can be wrapped around a MulticlassClassificationEvaluator to expose weighted F1-score as metric. The functionality could be similar to scikit(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) in that we can support micro, macro and weighted versions of the F1-score (with weighted being default) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9151) Implement code generation for Abs
[ https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632533#comment-14632533 ] Apache Spark commented on SPARK-9151: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/7498 Implement code generation for Abs - Key: SPARK-9151 URL: https://issues.apache.org/jira/browse/SPARK-9151 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9169) Improve unit test coverage for null expressions
[ https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9169. Resolution: Fixed Fix Version/s: 1.5.0 Improve unit test coverage for null expressions --- Key: SPARK-9169 URL: https://issues.apache.org/jira/browse/SPARK-9169 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9171) add and improve tests for nondeterministic expressions
[ https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9171. Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 1.5.0 add and improve tests for nondeterministic expressions -- Key: SPARK-9171 URL: https://issues.apache.org/jira/browse/SPARK-9171 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan Priority: Trivial Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python
[ https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632494#comment-14632494 ] Apache Spark commented on SPARK-9166: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/7497 Hide JVM stack trace for IllegalArgumentException in Python --- Key: SPARK-9166 URL: https://issues.apache.org/jira/browse/SPARK-9166 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We now hide stack trace for AnalysisException. We should also hide it for IllegalArgumentException. See this ticket to see how to fix this problem: https://github.com/apache/spark/pull/7135 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python
[ https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9166: --- Assignee: (was: Apache Spark) Hide JVM stack trace for IllegalArgumentException in Python --- Key: SPARK-9166 URL: https://issues.apache.org/jira/browse/SPARK-9166 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We now hide stack trace for AnalysisException. We should also hide it for IllegalArgumentException. See this ticket to see how to fix this problem: https://github.com/apache/spark/pull/7135 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python
[ https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9166: --- Assignee: Apache Spark Hide JVM stack trace for IllegalArgumentException in Python --- Key: SPARK-9166 URL: https://issues.apache.org/jira/browse/SPARK-9166 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark We now hide stack trace for AnalysisException. We should also hide it for IllegalArgumentException. See this ticket to see how to fix this problem: https://github.com/apache/spark/pull/7135 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9167) use UTC Calendar in `stringToDate`
[ https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9167. Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 1.5.0 use UTC Calendar in `stringToDate` -- Key: SPARK-9167 URL: https://issues.apache.org/jira/browse/SPARK-9167 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org