date:20150718

[jira] [Created] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9166:
--

 Summary: Hide JVM stack trace for IllegalArgumentException in 
Python
 Key: SPARK-9166
 URL: https://issues.apache.org/jira/browse/SPARK-9166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin


We now hide stack trace for AnalysisException. We should also hide it for 
IllegalArgumentException.


See this ticket to see how to fix this problem: 
https://github.com/apache/spark/pull/7135




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-7981) Improve DataFrame Python exception

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-7981.
--
   Resolution: Duplicate
 Assignee: Davies Liu
Fix Version/s: 1.5.0

 Improve DataFrame Python exception
 --

 Key: SPARK-7981
 URL: https://issues.apache.org/jira/browse/SPARK-7981
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu
 Fix For: 1.5.0


 It would be great if most exceptions thrown are rethrown as Python 
 exceptions, rather than some crazy Py4j exception with a long stacktrace that 
 is not Python friendly.
 As an example
 {code}
 In [61]: df.stat.cov('id', 'uniform')
 ---
 Py4JJavaError Traceback (most recent call last)
 ipython-input-61-30146c89cbd6 in module()
  1 df.stat.cov('id', 'uniform')
 /scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
1289 
1290 def cov(self, col1, col2):
 - 1291 return self.df.cov(col1, col2)
1292 
1293 cov.__doc__ = DataFrame.cov.__doc__
 /scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
1139 if not isinstance(col2, str):
1140 raise ValueError(col2 should be a string.)
 - 1141 return self._jdf.stat().cov(col1, col2)
1142 
1143 @since(1.4)
 /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc
  in __call__(self, *args)
 535 answer = self.gateway_client.send_command(command)
 536 return_value = get_return_value(answer, self.gateway_client,
 -- 537 self.target_id, self.name)
 538 
 539 for temp_arg in temp_args:
 /Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc
  in get_return_value(answer, gateway_client, target_id, name)
 298 raise Py4JJavaError(
 299 'An error occurred while calling {0}{1}{2}.\n'.
 -- 300 format(target_id, '.', name), value)
 301 else:
 302 raise Py4JError(
 Py4JJavaError: An error occurred while calling o87.cov.
 : java.lang.IllegalArgumentException: requirement failed: Couldn't find 
 column with name id
   at scala.Predef$.require(Predef.scala:233)
   at 
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:79)
   at 
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:78)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 org.apache.spark.sql.execution.stat.StatFunctions$.collectStatisticalData(StatFunctions.scala:78)
   at 
 org.apache.spark.sql.execution.stat.StatFunctions$.calculateCov(StatFunctions.scala:100)
   at 
 org.apache.spark.sql.DataFrameStatFunctions.cov(DataFrameStatFunctions.scala:41)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
   at py4j.Gateway.invoke(Gateway.java:259)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:207)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8240) string function: concat

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632292#comment-14632292
 ] 

Apache Spark commented on SPARK-8240:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7486

 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Cheng Hao

 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8240) string function: concat

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reassigned SPARK-8240:
--

Assignee: Reynold Xin  (was: Cheng Hao)

 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin

 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8240) string function: concat

2015-07-18 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632294#comment-14632294
 ] 

Reynold Xin commented on SPARK-8240:


[~adrian-wang] I had some time tonight and wrote a version of this that has 
codegen and avoids conversion back and forth between String and UTF8String.


 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin

 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9151:
--

 Summary: Implement code generation for Abs
 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9169) Improve unit test coverage for null expressions

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632342#comment-14632342
 ] 

Apache Spark commented on SPARK-9169:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7490

 Improve unit test coverage for null expressions
 ---

 Key: SPARK-9169
 URL: https://issues.apache.org/jira/browse/SPARK-9169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9169) Improve unit test coverage for null expressions

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9169:
--

 Summary: Improve unit test coverage for null expressions
 Key: SPARK-9169
 URL: https://issues.apache.org/jira/browse/SPARK-9169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9169) Improve unit test coverage for null expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9169:
---

Assignee: Apache Spark  (was: Reynold Xin)

 Improve unit test coverage for null expressions
 ---

 Key: SPARK-9169
 URL: https://issues.apache.org/jira/browse/SPARK-9169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7218) Create a real iterator with open/close for Spark SQL

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-7218:
---
Target Version/s: 1.6.0

 Create a real iterator with open/close for Spark SQL
 

 Key: SPARK-7218
 URL: https://issues.apache.org/jira/browse/SPARK-7218
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9150) Create a trait to track code generation for expressions

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632311#comment-14632311
 ] 

Apache Spark commented on SPARK-9150:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7487

 Create a trait to track code generation for expressions
 ---

 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 It is very hard to track which expression supports code generation or not. 
 This patch removes the default gencode implementation from Expression, and 
 moves the default fallback implementation into a new trait called 
 CodegenFallback. Each concrete expression needs to either implement code 
 generation, or mix in CodegenFallback.
 This makes it very easy to track which expressions have code generation 
 implemented already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9150) Create a trait to track code generation for expressions

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9150:
---
Summary: Create a trait to track code generation for expressions  (was: 
Create a trait to track lack of code generation for expressions)

 Create a trait to track code generation for expressions
 ---

 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 It is very hard to track which expression supports code generation or not. 
 This patch removes the default gencode implementation from Expression, and 
 moves the default fallback implementation into a new trait called 
 CodegenFallback. Each concrete expression needs to either implement code 
 generation, or mix in CodegenFallback.
 This makes it very easy to track which expressions have code generation 
 implemented already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9150) Create a trait to track lack of code generation for expressions

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9150:
--

 Summary: Create a trait to track lack of code generation for 
expressions
 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical


It is very hard to track which expression supports code generation or not. This 
patch removes the default gencode implementation from Expression, and moves the 
default fallback implementation into a new trait called CodegenFallback. Each 
concrete expression needs to either implement code generation, or mix in 
CodegenFallback.

This makes it very easy to track which expressions have code generation 
implemented already.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4867) UDF clean up

2015-07-18 Thread Reynold Xin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632319#comment-14632319
]

Reynold Xin edited comment on SPARK-4867 at 7/18/15 7:37 AM:
-

I believe this is done - we are now doing type coercions for UDFs, and we no
longer hack the parser for new UDFs.

was (Author: rxin):
I believe this is done - we are now doing type coercions for UDFs.

UDF clean up

Key: SPARK-4867
URL: https://issues.apache.org/jira/browse/SPARK-4867
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Michael Armbrust
Assignee: Reynold Xin
Priority: Blocker
Fix For: 1.5.0

Right now our support and internal implementation of many functions has a few
issues. Specifically:
- UDFS don't know their input types and thus don't do type coercion.
- We hard code a bunch of built in functions into the parser. This is bad
because in SQL it creates new reserved words for things that aren't actually
keywords. Also it means that for each function we need to add support to
both SQLContext and HiveContext separately.
For this JIRA I propose we do the following:
- Change the interfaces for registerFunction and ScalaUdf to include types
for the input arguments as well as the output type.
- Add a rule to analysis that does type coercion for UDFs.
- Add a parse rule for functions to SQLParser.
- Rewrite all the UDFs that are currently hacked into the various parsers
using this new functionality.
Depending on how big this refactoring becomes we could split parts 12 from
part 3 above.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4867) UDF clean up

2015-07-18 Thread Reynold Xin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin resolved SPARK-4867.

Resolution: Fixed
Assignee: Reynold Xin
Fix Version/s: 1.5.0

I believe this is done - we are now doing type coercions for UDFs.

UDF clean up

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9019:
---

Assignee: Apache Spark

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
Assignee: Apache Spark
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at

[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-18 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632341#comment-14632341
 ] 

Bolke de Bruin commented on SPARK-9019:
---

I have created PR-#7489 for this issue. 

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at

[jira] [Assigned] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9019:
---

Assignee: (was: Apache Spark)

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
 at

[jira] [Issue Comment Deleted] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-18 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated SPARK-9019:
--
Comment: was deleted

(was: I have created PR-#7489 for this issue. )

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at

[jira] [Created] (SPARK-9161) Implement code generation for FormatNumber

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9161:
--

 Summary: Implement code generation for FormatNumber
 Key: SPARK-9161
 URL: https://issues.apache.org/jira/browse/SPARK-9161
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9163) Implement code generation for Conv

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9163:
--

 Summary: Implement code generation for Conv
 Key: SPARK-9163
 URL: https://issues.apache.org/jira/browse/SPARK-9163
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9159) Implement code generation for Ascii, Base64, and UnBase64

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9159:
--

 Summary: Implement code generation for Ascii, Base64, and UnBase64
 Key: SPARK-9159
 URL: https://issues.apache.org/jira/browse/SPARK-9159
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9158) PyLint should only fail on error

2015-07-18 Thread Davies Liu (JIRA)

Davies Liu created SPARK-9158:
-

 Summary: PyLint should only fail on error
 Key: SPARK-9158
 URL: https://issues.apache.org/jira/browse/SPARK-9158
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Reporter: Davies Liu
Priority: Critical


It's boring to fight with warning from Pylint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9160) Implement code generation for Encode and Decode

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9160:
--

 Summary: Implement code generation for Encode and Decode
 Key: SPARK-9160
 URL: https://issues.apache.org/jira/browse/SPARK-9160
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5288) Stabilize Spark SQL data type API followup

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-5288.

   Resolution: Fixed
 Assignee: Reynold Xin
Fix Version/s: 1.5.0

I think everything here has been done.


 Stabilize Spark SQL data type API followup 
 ---

 Key: SPARK-5288
 URL: https://issues.apache.org/jira/browse/SPARK-5288
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Assignee: Reynold Xin
 Fix For: 1.5.0


 Several issues we need to address before release 1.3
 * Do we want to make all classes in 
 org.apache.spark.sql.types.dataTypes.scala public? Seems we do not need to 
 make those abstract classes public.
 * Seems NativeType is not a very clear and useful concept. Should we just 
 remove it?
 * We need to Stabilize the type hierarchy of our data types. Seems StringType 
 and Decimal Type should not be primitive types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8829) Improve expression performance

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8829:
---
Description: 
This is an umbrella ticket for various improvements to DataFrame and SQL 
expression performance.

These expressions can be found in the org.apache.spark.sql.catalyst.expressions 
package.


  was:
This is an umbrella ticket for various improvements to DataFrame and SQL 
expression performance.



 Improve expression performance
 --

 Key: SPARK-8829
 URL: https://issues.apache.org/jira/browse/SPARK-8829
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Reynold Xin

 This is an umbrella ticket for various improvements to DataFrame and SQL 
 expression performance.
 These expressions can be found in the 
 org.apache.spark.sql.catalyst.expressions package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5295) Stabilize data types

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-5295.

   Resolution: Fixed
 Assignee: Reynold Xin
Fix Version/s: 1.5.0

I believe the important things in this ticket has been done. We haven't 
explicitly define external/internal types yet. We should create a new ticket 
for that. 

 Stabilize data types
 

 Key: SPARK-5295
 URL: https://issues.apache.org/jira/browse/SPARK-5295
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 1. We expose all the stuff in data types right now, including NumericTypes, 
 etc. These should be hidden from users. We should only expose the leaf types.
 2. Remove DeveloperAPI tag from the common types.
 3. Specify the internal type, external scala type, and external java type for 
 each data type.
 4. Add conversion functions between internal type, external scala type, and 
 external java type into each type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9081) fillna/dropna should also fill/drop NaN values in addition to null values

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9081:
---
Priority: Blocker  (was: Major)

 fillna/dropna should also fill/drop NaN values in addition to null values
 -

 Key: SPARK-9081
 URL: https://issues.apache.org/jira/browse/SPARK-9081
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9167) call `millisToDays` in `stringToDate`

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9167:
---

Assignee: (was: Apache Spark)

 call `millisToDays` in `stringToDate`
 -

 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9167) call `millisToDays` in `stringToDate`

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632328#comment-14632328
 ] 

Apache Spark commented on SPARK-9167:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7488

 call `millisToDays` in `stringToDate`
 -

 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9167) call `millisToDays` in `stringToDate`

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9167:
---

Assignee: Apache Spark

 call `millisToDays` in `stringToDate`
 -

 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9167) use UTC Calendar in `stringToDate`

2015-07-18 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-9167:
---
Summary: use UTC Calendar in `stringToDate`  (was: call `millisToDays` in 
`stringToDate`)

 use UTC Calendar in `stringToDate`
 --

 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9055) WidenTypes should also support Intersect and Except

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9055:
---

Assignee: Apache Spark

 WidenTypes should also support Intersect and Except
 ---

 Key: SPARK-9055
 URL: https://issues.apache.org/jira/browse/SPARK-9055
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark

 HiveTypeCoercion.WidenTypes only supports Union right now. It should also 
 support Intersect and Except.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9055) WidenTypes should also support Intersect and Except

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632344#comment-14632344
 ] 

Apache Spark commented on SPARK-9055:
-

User 'yijieshen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7491

 WidenTypes should also support Intersect and Except
 ---

 Key: SPARK-9055
 URL: https://issues.apache.org/jira/browse/SPARK-9055
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 HiveTypeCoercion.WidenTypes only supports Union right now. It should also 
 support Intersect and Except.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9055) WidenTypes should also support Intersect and Except

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9055:
---

Assignee: (was: Apache Spark)

 WidenTypes should also support Intersect and Except
 ---

 Key: SPARK-9055
 URL: https://issues.apache.org/jira/browse/SPARK-9055
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 HiveTypeCoercion.WidenTypes only supports Union right now. It should also 
 support Intersect and Except.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9150) Create a trait to track code generation for expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9150:
---

Assignee: Reynold Xin  (was: Apache Spark)

 Create a trait to track code generation for expressions
 ---

 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 It is very hard to track which expression supports code generation or not. 
 This patch removes the default gencode implementation from Expression, and 
 moves the default fallback implementation into a new trait called 
 CodegenFallback. Each concrete expression needs to either implement code 
 generation, or mix in CodegenFallback.
 This makes it very easy to track which expressions have code generation 
 implemented already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9150) Create CodegenFallback and Unevaluable trait

2015-07-18 Thread Reynold Xin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-9150:
---
Description:
It is very hard to track which expression supports code generation or not. This
patch removes the default gencode implementation from Expression, and moves the
default fallback implementation into a new trait called CodegenFallback. Each
concrete expression needs to either implement code generation, or mix in
CodegenFallback. This makes it very easy to track which expressions have code
generation implemented already.

Additionally, this patch creates an Unevaluable trait that can be used to track
expressions that don't support evaluation (e.g. Star).

was:
It is very hard to track which expression supports code generation or not. This
patch removes the default gencode implementation from Expression, and moves the
default fallback implementation into a new trait called CodegenFallback. Each
concrete expression needs to either implement code generation, or mix in
CodegenFallback.

This makes it very easy to track which expressions have code generation
implemented already.

Create CodegenFallback and Unevaluable trait

Key: SPARK-9150
URL: https://issues.apache.org/jira/browse/SPARK-9150
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

It is very hard to track which expression supports code generation or not.
This patch removes the default gencode implementation from Expression, and
moves the default fallback implementation into a new trait called
CodegenFallback. Each concrete expression needs to either implement code
generation, or mix in CodegenFallback. This makes it very easy to track which
expressions have code generation implemented already.
Additionally, this patch creates an Unevaluable trait that can be used to
track expressions that don't support evaluation (e.g. Star).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9150) Create a trait to track code generation for expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9150:
---

Assignee: Apache Spark  (was: Reynold Xin)

 Create a trait to track code generation for expressions
 ---

 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark
Priority: Critical

 It is very hard to track which expression supports code generation or not. 
 This patch removes the default gencode implementation from Expression, and 
 moves the default fallback implementation into a new trait called 
 CodegenFallback. Each concrete expression needs to either implement code 
 generation, or mix in CodegenFallback.
 This makes it very easy to track which expressions have code generation 
 implemented already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9150) Create CodegenFallback and Unevaluable trait

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9150:
---
Summary: Create CodegenFallback and Unevaluable trait  (was: Create a trait 
to track code generation for expressions)

 Create CodegenFallback and Unevaluable trait
 

 Key: SPARK-9150
 URL: https://issues.apache.org/jira/browse/SPARK-9150
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 It is very hard to track which expression supports code generation or not. 
 This patch removes the default gencode implementation from Expression, and 
 moves the default fallback implementation into a new trait called 
 CodegenFallback. Each concrete expression needs to either implement code 
 generation, or mix in CodegenFallback.
 This makes it very easy to track which expressions have code generation 
 implemented already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9164) Implement code generation for Hex and Unhex

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9164:
--

 Summary: Implement code generation for Hex and Unhex
 Key: SPARK-9164
 URL: https://issues.apache.org/jira/browse/SPARK-9164
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9162) Implement code generation for ScalaUDF

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9162:
--

 Summary: Implement code generation for ScalaUDF
 Key: SPARK-9162
 URL: https://issues.apache.org/jira/browse/SPARK-9162
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9165) Implement code generation for CreateArray, CreateStruct, and CreateNamedStruct

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9165:
--

 Summary: Implement code generation for CreateArray, CreateStruct, 
and CreateNamedStruct
 Key: SPARK-9165
 URL: https://issues.apache.org/jira/browse/SPARK-9165
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9167) call `millisToDays` in `stringToDate`

2015-07-18 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-9167:
--

 Summary: call `millisToDays` in `stringToDate`
 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9156) Implement code generation for StringSplit

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9156:
--

 Summary: Implement code generation for StringSplit
 Key: SPARK-9156
 URL: https://issues.apache.org/jira/browse/SPARK-9156
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9153) Implement code generation for StringLPad and StringRPad

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9153:
--

 Summary: Implement code generation for StringLPad and StringRPad
 Key: SPARK-9153
 URL: https://issues.apache.org/jira/browse/SPARK-9153
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9152) Implement code generation for Like and RLike

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9152:
--

 Summary: Implement code generation for Like and RLike
 Key: SPARK-9152
 URL: https://issues.apache.org/jira/browse/SPARK-9152
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9154) Implement code generation for StringFormat

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9154:
--

 Summary: Implement code generation for StringFormat
 Key: SPARK-9154
 URL: https://issues.apache.org/jira/browse/SPARK-9154
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9155) Implement code generation for StringSpace

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9155:
--

 Summary: Implement code generation for StringSpace
 Key: SPARK-9155
 URL: https://issues.apache.org/jira/browse/SPARK-9155
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9168) Add nanvl expression

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9168:
--

 Summary: Add nanvl expression
 Key: SPARK-9168
 URL: https://issues.apache.org/jira/browse/SPARK-9168
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin


Similar to Oracle's nanvl:

nanvl(v1, v2)

if v1 is NaN, returns v2; otherwise, returns v1.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632340#comment-14632340
 ] 

Apache Spark commented on SPARK-9019:
-

User 'bolkedebruin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7489

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at

[jira] [Assigned] (SPARK-9169) Improve unit test coverage for null expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9169:
---

Assignee: Reynold Xin  (was: Apache Spark)

 Improve unit test coverage for null expressions
 ---

 Key: SPARK-9169
 URL: https://issues.apache.org/jira/browse/SPARK-9169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9170) ORC data source creates a schema with lowercase table names

2015-07-18 Thread Peter Rudenko (JIRA)

Peter Rudenko created SPARK-9170:


 Summary: ORC data source creates a schema with lowercase table 
names
 Key: SPARK-9170
 URL: https://issues.apache.org/jira/browse/SPARK-9170
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: Peter Rudenko


Steps to reproduce:

{code}
sqlContext.range(0, 10).select('id as 
Acol).write.format(orc).save(/tmp/foo)

 sqlContext.read.format(orc).load(/tmp/foo).schema(Acol)
//java.lang.IllegalArgumentException: Field Acol does not exist.

sqlContext.read.format(orc).load(/tmp/foo).schema(acol)
//org.apache.spark.sql.types.StructField = StructField(acol,LongType,true)

sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show()
//++
|Acol|
++
|   1|
|   5|
|   3|
|   4|
|   7|
|   2|
|   6|
|   8|
|   9|
|   0|
++
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9170) ORC data source creates a schema with lowercase table names

2015-07-18 Thread Peter Rudenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Rudenko updated SPARK-9170:
-
Description: 
Steps to reproduce:

{code}
sqlContext.range(0, 10).select('id as 
Acol).write.format(orc).save(/tmp/foo)

sqlContext.read.format(orc).load(/tmp/foo).schema(Acol)
//java.lang.IllegalArgumentException: Field Acol does not exist.

sqlContext.read.format(orc).load(/tmp/foo).schema(acol)
//org.apache.spark.sql.types.StructField = StructField(acol,LongType,true)

sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show()
//++
|Acol|
++
|   1|
|   5|
|   3|
|   4|
|   7|
|   2|
|   6|
|   8|
|   9|
|   0|
++
{code}

  was:
Steps to reproduce:

{code}
sqlContext.range(0, 10).select('id as 
Acol).write.format(orc).save(/tmp/foo)

 sqlContext.read.format(orc).load(/tmp/foo).schema(Acol)
//java.lang.IllegalArgumentException: Field Acol does not exist.

sqlContext.read.format(orc).load(/tmp/foo).schema(acol)
//org.apache.spark.sql.types.StructField = StructField(acol,LongType,true)

sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show()
//++
|Acol|
++
|   1|
|   5|
|   3|
|   4|
|   7|
|   2|
|   6|
|   8|
|   9|
|   0|
++
{code}


 ORC data source creates a schema with lowercase table names
 ---

 Key: SPARK-9170
 URL: https://issues.apache.org/jira/browse/SPARK-9170
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: Peter Rudenko

 Steps to reproduce:
 {code}
 sqlContext.range(0, 10).select('id as 
 Acol).write.format(orc).save(/tmp/foo)
 sqlContext.read.format(orc).load(/tmp/foo).schema(Acol)
 //java.lang.IllegalArgumentException: Field Acol does not exist.
 sqlContext.read.format(orc).load(/tmp/foo).schema(acol)
 //org.apache.spark.sql.types.StructField = StructField(acol,LongType,true)
 sqlContext.read.format(orc).load(/tmp/foo).select(Acol).show()
 //++
 |Acol|
 ++
 |   1|
 |   5|
 |   3|
 |   4|
 |   7|
 |   2|
 |   6|
 |   8|
 |   9|
 |   0|
 ++
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges

2015-07-18 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632476#comment-14632476
 ] 

Cody Koeninger commented on SPARK-9059:
---

How is this different from SPARK-8390 ?

I thought the idea there was to keep word count simple, and have separate 
examples for offsets.

Restarting from specific offsets is a good idea, but requires storage. If we 
want to move the examples from my external repo 
https://github.com/koeninger/kafka-exactly-once into spark, it would probably 
require switching from postgres to an in memory database.

 Update Direct Kafka Word count examples to show the use of HasOffsetRanges
 --

 Key: SPARK-9059
 URL: https://issues.apache.org/jira/browse/SPARK-9059
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Tathagata Das
  Labels: starter

 Update Scala, Java and Python examples of Direct Kafka word count to access 
 the offset ranges using HasOffsetRanges and print it. For example in Scala,
  
 {code}
 var offsetRanges: Array[OffsetRange] = _
 ...
 directKafkaDStream.foreachRDD { rdd = 
 offsetRanges = rdd.asInstanceOf[HasOffsetRanges]  
 }
 ...
 transformedDStream.foreachRDD { rdd = 
 // some operation
 println(Processed ranges:  + offsetRanges)
 }
 {code}
 See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for 
 more info, and the master source code for more updated information on python. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9094:
---

Assignee: (was: Apache Spark)

 Increase io.dropwizard.metrics dependency to 3.1.2
 --

 Key: SPARK-9094
 URL: https://issues.apache.org/jira/browse/SPARK-9094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Carl Anders Düvel
Priority: Minor

 This change is described in pull request:
 https://github.com/apache/spark/pull/7422



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2

2015-07-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Anders Düvel updated SPARK-9094:
-
Description: 
This change is described in pull request:
https://github.com/apache/spark/pull/7493

  was:
This change is described in pull request:

https://github.com/apache/spark/pull/7422


 Increase io.dropwizard.metrics dependency to 3.1.2
 --

 Key: SPARK-9094
 URL: https://issues.apache.org/jira/browse/SPARK-9094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Carl Anders Düvel
Priority: Minor

 This change is described in pull request:
 https://github.com/apache/spark/pull/7493



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9094:
---

Assignee: Apache Spark

 Increase io.dropwizard.metrics dependency to 3.1.2
 --

 Key: SPARK-9094
 URL: https://issues.apache.org/jira/browse/SPARK-9094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Carl Anders Düvel
Assignee: Apache Spark
Priority: Minor

 This change is described in pull request:
 https://github.com/apache/spark/pull/7422



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632377#comment-14632377
 ] 

Apache Spark commented on SPARK-9094:
-

User 'hackbert' has created a pull request for this issue:
https://github.com/apache/spark/pull/7493

 Increase io.dropwizard.metrics dependency to 3.1.2
 --

 Key: SPARK-9094
 URL: https://issues.apache.org/jira/browse/SPARK-9094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Carl Anders Düvel
Priority: Minor

 This change is described in pull request:
 https://github.com/apache/spark/pull/7422



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632425#comment-14632425
 ] 

Apache Spark commented on SPARK-9171:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7496

 add and improve tests for nondeterministic expressions
 --

 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9171:
---

Assignee: (was: Apache Spark)

 add and improve tests for nondeterministic expressions
 --

 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9094) Increase io.dropwizard.metrics dependency to 3.1.2

2015-07-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SPARK-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Anders Düvel updated SPARK-9094:
-
External issue URL: https://github.com/apache/spark/pull/7493  (was: 
https://github.com/apache/spark/pull/7422)

 Increase io.dropwizard.metrics dependency to 3.1.2
 --

 Key: SPARK-9094
 URL: https://issues.apache.org/jira/browse/SPARK-9094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Carl Anders Düvel
Priority: Minor

 This change is described in pull request:
 https://github.com/apache/spark/pull/7493



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632373#comment-14632373
 ] 

Apache Spark commented on SPARK-6910:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7492

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8951) support CJK characters in collect()

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8951:
---

Assignee: Apache Spark

 support CJK characters in collect()
 ---

 Key: SPARK-8951
 URL: https://issues.apache.org/jira/browse/SPARK-8951
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Jaehong Choi
Assignee: Apache Spark
Priority: Minor
 Attachments: SerDe.scala.diff


 Spark gives an error message and does not show the output when a field of the 
 result DataFrame contains characters in CJK.
 I found out that SerDe in R API only supports ASCII format for strings right 
 now as commented in source code.  
 So, I fixed SerDe.scala a little to support CJK as the file attached. 
 I did not care efficiency, but just wanted to see if it works.
 {noformat}
 people.json
 {name:가나}
 {name:테스트123, age:30}
 {name:Justin, age:19}
 df - read.df(sqlContext, ./people.json, json)
 head(df)
 Error in rawtochar(string) : embedded nul in string : '\0 \x98'
 {noformat}
 {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala}
   // NOTE: Only works for ASCII right now
   def writeString(out: DataOutputStream, value: String): Unit = {
 val len = value.length
 out.writeInt(len + 1) // For the \0
 out.writeBytes(value)
 out.writeByte(0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8951) support CJK characters in collect()

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632408#comment-14632408
 ] 

Apache Spark commented on SPARK-8951:
-

User 'CHOIJAEHONG1' has created a pull request for this issue:
https://github.com/apache/spark/pull/7494

 support CJK characters in collect()
 ---

 Key: SPARK-8951
 URL: https://issues.apache.org/jira/browse/SPARK-8951
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Jaehong Choi
Priority: Minor
 Attachments: SerDe.scala.diff


 Spark gives an error message and does not show the output when a field of the 
 result DataFrame contains characters in CJK.
 I found out that SerDe in R API only supports ASCII format for strings right 
 now as commented in source code.  
 So, I fixed SerDe.scala a little to support CJK as the file attached. 
 I did not care efficiency, but just wanted to see if it works.
 {noformat}
 people.json
 {name:가나}
 {name:테스트123, age:30}
 {name:Justin, age:19}
 df - read.df(sqlContext, ./people.json, json)
 head(df)
 Error in rawtochar(string) : embedded nul in string : '\0 \x98'
 {noformat}
 {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala}
   // NOTE: Only works for ASCII right now
   def writeString(out: DataOutputStream, value: String): Unit = {
 val len = value.length
 out.writeInt(len + 1) // For the \0
 out.writeBytes(value)
 out.writeByte(0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8951) support CJK characters in collect()

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8951:
---

Assignee: (was: Apache Spark)

 support CJK characters in collect()
 ---

 Key: SPARK-8951
 URL: https://issues.apache.org/jira/browse/SPARK-8951
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Jaehong Choi
Priority: Minor
 Attachments: SerDe.scala.diff


 Spark gives an error message and does not show the output when a field of the 
 result DataFrame contains characters in CJK.
 I found out that SerDe in R API only supports ASCII format for strings right 
 now as commented in source code.  
 So, I fixed SerDe.scala a little to support CJK as the file attached. 
 I did not care efficiency, but just wanted to see if it works.
 {noformat}
 people.json
 {name:가나}
 {name:테스트123, age:30}
 {name:Justin, age:19}
 df - read.df(sqlContext, ./people.json, json)
 head(df)
 Error in rawtochar(string) : embedded nul in string : '\0 \x98'
 {noformat}
 {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala}
   // NOTE: Only works for ASCII right now
   def writeString(out: DataOutputStream, value: String): Unit = {
 val len = value.length
 out.writeInt(len + 1) // For the \0
 out.writeBytes(value)
 out.writeByte(0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-9171:
--

 Summary: add and improve tests for nondeterministic expressions
 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9171:
---

Assignee: Apache Spark

 add and improve tests for nondeterministic expressions
 --

 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Apache Spark
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9081) fillna/dropna should also fill/drop NaN values in addition to null values

2015-07-18 Thread Yijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632469#comment-14632469
 ] 

Yijie Shen commented on SPARK-9081:
---

OK, I'll take this

 fillna/dropna should also fill/drop NaN values in addition to null values
 -

 Key: SPARK-9081
 URL: https://issues.apache.org/jira/browse/SPARK-9081
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9142) Removing unnecessary self types in Catalyst

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632422#comment-14632422
 ] 

Apache Spark commented on SPARK-9142:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7495

 Removing unnecessary self types in Catalyst
 ---

 Key: SPARK-9142
 URL: https://issues.apache.org/jira/browse/SPARK-9142
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 A small change, based on code review and offline discussion with [~dragos].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges

2015-07-18 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632528#comment-14632528
 ] 

Sean Owen commented on SPARK-9059:
--

Yeah closing this as a duplicate unless someone can update the title to 
differentiate this from pdate DirectKafkaWordCount examples to show how offset 
ranges can be used and https://github.com/apache/spark/pull/6863/files which 
was committed.

 Update Direct Kafka Word count examples to show the use of HasOffsetRanges
 --

 Key: SPARK-9059
 URL: https://issues.apache.org/jira/browse/SPARK-9059
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Tathagata Das
  Labels: starter

 Update Scala, Java and Python examples of Direct Kafka word count to access 
 the offset ranges using HasOffsetRanges and print it. For example in Scala,
  
 {code}
 var offsetRanges: Array[OffsetRange] = _
 ...
 directKafkaDStream.foreachRDD { rdd = 
 offsetRanges = rdd.asInstanceOf[HasOffsetRanges]  
 }
 ...
 transformedDStream.foreachRDD { rdd = 
 // some operation
 println(Processed ranges:  + offsetRanges)
 }
 {code}
 See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for 
 more info, and the master source code for more updated information on python. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8390) Update DirectKafkaWordCount examples to show how offset ranges can be used

2015-07-18 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-8390.
--
  Resolution: Fixed
   Fix Version/s: 1.5.0
  1.4.2
Target Version/s:   (was: 1.5.0)

 Update DirectKafkaWordCount examples to show how offset ranges can be used
 --

 Key: SPARK-8390
 URL: https://issues.apache.org/jira/browse/SPARK-8390
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Cody Koeninger
 Fix For: 1.4.2, 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9151:
---

Assignee: (was: Apache Spark)

 Implement code generation for Abs
 -

 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9151:
---

Assignee: Apache Spark

 Implement code generation for Abs
 -

 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9151:
---
Assignee: Liang-Chi Hsieh

 Implement code generation for Abs
 -

 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Liang-Chi Hsieh





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8159) Improve expression function coverage

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8159:
---
Summary: Improve expression function coverage  (was: Improve SQL/DataFrame 
expression coverage)

 Improve expression function coverage
 

 Key: SPARK-8159
 URL: https://issues.apache.org/jira/browse/SPARK-8159
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin

 This is an umbrella ticket to track new expressions we are adding to 
 SQL/DataFrame.
 For each new expression, we should:
 1. Add a new Expression implementation in 
 org.apache.spark.sql.catalyst.expressions
 2. If applicable, implement the code generated version (by implementing 
 genCode).
 3. Add comprehensive unit tests (for all the data types the expressions 
 support).
 4. If applicable, add a new function for DataFrame in 
 org.apache.spark.sql.functions, and python/pyspark/sql/functions.py for 
 Python.
 For date/time functions, put them in expressions/datetime.scala, and create a 
 DateTimeFunctionSuite.scala for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9171:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-8159

 add and improve tests for nondeterministic expressions
 --

 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan
Priority: Trivial
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9151.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Implement code generation for Abs
 -

 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Liang-Chi Hsieh
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9101) Can't use null in selectExpr

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9101:
---

Assignee: Apache Spark

 Can't use null in selectExpr
 

 Key: SPARK-9101
 URL: https://issues.apache.org/jira/browse/SPARK-9101
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Mateusz Buśkiewicz
Assignee: Apache Spark

 In 1.3.1 this worked:
 {code:python}
 df = sqlContext.createDataFrame([[1]], schema=['col'])
 df.selectExpr('null as newCol').collect()
 {code}
 In 1.4.0 it fails with the following stacktrace:
 {code}
 Traceback (most recent call last):
   File input, line 1, in module
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py,
  line 316, in collect
 cls = _create_cls(self.schema)
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py,
  line 229, in schema
 self._schema = _parse_datatype_json_string(self._jdf.schema().json())
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 519, in _parse_datatype_json_string
 return _parse_datatype_json_value(json.loads(json_string))
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 539, in _parse_datatype_json_value
 return _all_complex_types[tpe].fromJson(json_value)
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 386, in fromJson
 return StructType([StructField.fromJson(f) for f in json[fields]])
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 347, in fromJson
 _parse_datatype_json_value(json[type]),
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 535, in _parse_datatype_json_value
 raise ValueError(Could not parse datatype: %s % json_value)
 ValueError: Could not parse datatype: null
 {code}
 https://github.com/apache/spark/blob/v1.4.0/python/pyspark/sql/types.py#L461
 The cause:_atomic_types doesn't contain NullType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9101) Can't use null in selectExpr

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632571#comment-14632571
 ] 

Apache Spark commented on SPARK-9101:
-

User 'sixers' has created a pull request for this issue:
https://github.com/apache/spark/pull/7499

 Can't use null in selectExpr
 

 Key: SPARK-9101
 URL: https://issues.apache.org/jira/browse/SPARK-9101
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.4.0, 1.4.1
Reporter: Mateusz Buśkiewicz

 In 1.3.1 this worked:
 {code:python}
 df = sqlContext.createDataFrame([[1]], schema=['col'])
 df.selectExpr('null as newCol').collect()
 {code}
 In 1.4.0 it fails with the following stacktrace:
 {code}
 Traceback (most recent call last):
   File input, line 1, in module
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py,
  line 316, in collect
 cls = _create_cls(self.schema)
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/dataframe.py,
  line 229, in schema
 self._schema = _parse_datatype_json_string(self._jdf.schema().json())
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 519, in _parse_datatype_json_string
 return _parse_datatype_json_value(json.loads(json_string))
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 539, in _parse_datatype_json_value
 return _all_complex_types[tpe].fromJson(json_value)
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 386, in fromJson
 return StructType([StructField.fromJson(f) for f in json[fields]])
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 347, in fromJson
 _parse_datatype_json_value(json[type]),
   File 
 /opt/boxen/homebrew/opt/apache-spark/libexec/python/pyspark/sql/types.py, 
 line 535, in _parse_datatype_json_value
 raise ValueError(Could not parse datatype: %s % json_value)
 ValueError: Could not parse datatype: null
 {code}
 https://github.com/apache/spark/blob/v1.4.0/python/pyspark/sql/types.py#L461
 The cause:_atomic_types doesn't contain NullType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8240) string function: concat

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8240.

   Resolution: Fixed
Fix Version/s: 1.5.0

 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-9109) Unpersist a graph object does not work properly

2015-07-18 Thread Tien-Dung LE (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tien-Dung LE closed SPARK-9109.
---
Resolution: Fixed

The change has been merged at https://github.com/apache/spark/pull/7469


 Unpersist a graph object does not work properly
 ---

 Key: SPARK-9109
 URL: https://issues.apache.org/jira/browse/SPARK-9109
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.3.1, 1.4.0
Reporter: Tien-Dung LE
Priority: Minor

 Unpersist a graph object does not work properly.
 Here is the code to produce 
 {code}
 import org.apache.spark.graphx._
 import org.apache.spark.rdd.RDD
 import org.slf4j.LoggerFactory
 import org.apache.spark.graphx.util.GraphGenerators
 val graph: Graph[Long, Long] =
   GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) 
 = id.toLong ).mapEdges( e = e.attr.toLong)
   
 graph.cache().numEdges
 graph.unpersist()
 {code}
 There should not be any cached RDDs in storage 
 (http://localhost:4040/storage/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7961) Redesign SQLConf for better error message reporting

2015-07-18 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632593#comment-14632593
 ] 

Reynold Xin commented on SPARK-7961:


[~zsxwing] I tried this today, and it looks like it doesn't work in 
bin/spark-sql using set -v. Can you take a look?


 Redesign SQLConf for better error message reporting
 ---

 Key: SPARK-7961
 URL: https://issues.apache.org/jira/browse/SPARK-7961
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Shixiong Zhu
Priority: Critical
 Fix For: 1.5.0


 Right now, we don't validate config values and as a result will throw 
 exceptions when queries or DataFrame operations are run.
 Imagine if one user sets config variable spark.sql.retainGroupColumns 
 (requires true, false) to hello. The set action itself will complete 
 fine. When another user runs a query, it will throw the following exception:
 {code}
 java.lang.IllegalArgumentException: For input string: hello
 at 
 scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:238)
 at 
 scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:226)
 at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:31)
 at 
 org.apache.spark.sql.SQLConf.dataFrameRetainGroupColumns(SQLConf.scala:265)
 at org.apache.spark.sql.GroupedData.toDF(GroupedData.scala:74)
 at org.apache.spark.sql.GroupedData.agg(GroupedData.scala:227)
 {code}
 This is highly confusing. We should redesign SQLConf to validate data input 
 at set time (during setConf call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9174) Add documentation for all public SQLConfs

2015-07-18 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-9174:
--

 Summary: Add documentation for all public SQLConfs
 Key: SPARK-9174
 URL: https://issues.apache.org/jira/browse/SPARK-9174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9174) Add documentation for all public SQLConfs

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9174.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Add documentation for all public SQLConfs
 -

 Key: SPARK-9174
 URL: https://issues.apache.org/jira/browse/SPARK-9174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9055) WidenTypes should also support Intersect and Except in addition to Union

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9055.

   Resolution: Fixed
 Assignee: Yijie Shen
Fix Version/s: 1.5.0

 WidenTypes should also support Intersect and Except in addition to Union
 

 Key: SPARK-9055
 URL: https://issues.apache.org/jira/browse/SPARK-9055
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Yijie Shen
 Fix For: 1.5.0


 HiveTypeCoercion.WidenTypes only supports Union right now. It should also 
 support Intersect and Except.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9174) Add documentation for all public SQLConfs

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632595#comment-14632595
 ] 

Apache Spark commented on SPARK-9174:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7500

 Add documentation for all public SQLConfs
 -

 Key: SPARK-9174
 URL: https://issues.apache.org/jira/browse/SPARK-9174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9174) Add documentation for all public SQLConfs

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9174:
---

Assignee: Apache Spark  (was: Reynold Xin)

 Add documentation for all public SQLConfs
 -

 Key: SPARK-9174
 URL: https://issues.apache.org/jira/browse/SPARK-9174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9174) Add documentation for all public SQLConfs

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9174:
---

Assignee: Reynold Xin  (was: Apache Spark)

 Add documentation for all public SQLConfs
 -

 Key: SPARK-9174
 URL: https://issues.apache.org/jira/browse/SPARK-9174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8278) Remove deprecated JsonRDD functionality in Spark SQL

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632596#comment-14632596
 ] 

Apache Spark commented on SPARK-8278:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7501

 Remove deprecated JsonRDD functionality in Spark SQL
 

 Key: SPARK-8278
 URL: https://issues.apache.org/jira/browse/SPARK-8278
 Project: Spark
  Issue Type: Story
  Components: SQL
Reporter: Nathan Howell
Priority: Critical

 The old JSON functionality (deprecated in 1.4) needs to be removed for 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6793) Implement perplexity for LDA

2015-07-18 Thread Feynman Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632598#comment-14632598
 ] 

Feynman Liang commented on SPARK-6793:
--

I'm working on this.

 Implement perplexity for LDA
 

 Key: SPARK-6793
 URL: https://issues.apache.org/jira/browse/SPARK-6793
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
   Original Estimate: 168h
  Remaining Estimate: 168h

 LDA should be able to compute perplexity.  This JIRA is for computing it on 
 the training dataset.  See the linked JIRA for computing it on a new corpus: 
 [SPARK-5567]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5567) Add prediction methods to LDA

2015-07-18 Thread Feynman Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632597#comment-14632597
 ] 

Feynman Liang commented on SPARK-5567:
--

I'm working on this.

 Add prediction methods to LDA
 -

 Key: SPARK-5567
 URL: https://issues.apache.org/jira/browse/SPARK-5567
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
   Original Estimate: 168h
  Remaining Estimate: 168h

 LDA currently supports prediction on the training set.  E.g., you can call 
 logLikelihood and topicDistributions to get that info for the training data.  
 However, it should support the same functionality for new (test) documents.
 This will require inference but should be able to use the same code, with a 
 few modification to keep the inferred topics fixed.
 Note: The API for these methods is already in the code but is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7690) MulticlassClassificationEvaluator for tuning Multiclass Classifiers

2015-07-18 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7690:
-
Shepherd: Joseph K. Bradley  (was: Ram Sriharsha)

 MulticlassClassificationEvaluator for tuning Multiclass Classifiers
 ---

 Key: SPARK-7690
 URL: https://issues.apache.org/jira/browse/SPARK-7690
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Ram Sriharsha
Assignee: Eron Wright 

 Provide a MulticlassClassificationEvaluator with weighted F1-score to tune 
 multiclass classifiers using Pipeline API.
 MLLib already provides a MulticlassMetrics functionality which can be wrapped 
 around a MulticlassClassificationEvaluator to expose weighted F1-score as 
 metric.
 The functionality could be similar to 
 scikit(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
   in that we can support micro, macro and weighted versions of the F1-score 
 (with weighted being default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7690) MulticlassClassificationEvaluator for tuning Multiclass Classifiers

2015-07-18 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7690:
-
Issue Type: New Feature  (was: Improvement)

 MulticlassClassificationEvaluator for tuning Multiclass Classifiers
 ---

 Key: SPARK-7690
 URL: https://issues.apache.org/jira/browse/SPARK-7690
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Ram Sriharsha
Assignee: Ram Sriharsha

 Provide a MulticlassClassificationEvaluator with weighted F1-score to tune 
 multiclass classifiers using Pipeline API.
 MLLib already provides a MulticlassMetrics functionality which can be wrapped 
 around a MulticlassClassificationEvaluator to expose weighted F1-score as 
 metric.
 The functionality could be similar to 
 scikit(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
   in that we can support micro, macro and weighted versions of the F1-score 
 (with weighted being default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9151) Implement code generation for Abs

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632533#comment-14632533
 ] 

Apache Spark commented on SPARK-9151:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/7498

 Implement code generation for Abs
 -

 Key: SPARK-9151
 URL: https://issues.apache.org/jira/browse/SPARK-9151
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9169) Improve unit test coverage for null expressions

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9169.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Improve unit test coverage for null expressions
 ---

 Key: SPARK-9169
 URL: https://issues.apache.org/jira/browse/SPARK-9169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9171) add and improve tests for nondeterministic expressions

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9171.

   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 1.5.0

 add and improve tests for nondeterministic expressions
 --

 Key: SPARK-9171
 URL: https://issues.apache.org/jira/browse/SPARK-9171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan
Priority: Trivial
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python

2015-07-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632494#comment-14632494
 ] 

Apache Spark commented on SPARK-9166:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/7497

 Hide JVM stack trace for IllegalArgumentException in Python
 ---

 Key: SPARK-9166
 URL: https://issues.apache.org/jira/browse/SPARK-9166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 We now hide stack trace for AnalysisException. We should also hide it for 
 IllegalArgumentException.
 See this ticket to see how to fix this problem: 
 https://github.com/apache/spark/pull/7135



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9166:
---

Assignee: (was: Apache Spark)

 Hide JVM stack trace for IllegalArgumentException in Python
 ---

 Key: SPARK-9166
 URL: https://issues.apache.org/jira/browse/SPARK-9166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 We now hide stack trace for AnalysisException. We should also hide it for 
 IllegalArgumentException.
 See this ticket to see how to fix this problem: 
 https://github.com/apache/spark/pull/7135



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9166) Hide JVM stack trace for IllegalArgumentException in Python

2015-07-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9166:
---

Assignee: Apache Spark

 Hide JVM stack trace for IllegalArgumentException in Python
 ---

 Key: SPARK-9166
 URL: https://issues.apache.org/jira/browse/SPARK-9166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark

 We now hide stack trace for AnalysisException. We should also hide it for 
 IllegalArgumentException.
 See this ticket to see how to fix this problem: 
 https://github.com/apache/spark/pull/7135



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9167) use UTC Calendar in `stringToDate`

2015-07-18 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9167.

   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 1.5.0

 use UTC Calendar in `stringToDate`
 --

 Key: SPARK-9167
 URL: https://issues.apache.org/jira/browse/SPARK-9167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 138 matches

Mail list logo