date:20150721

[jira] [Resolved] (SPARK-9081) fillna/dropna should also fill/drop NaN values in addition to null values

2015-07-21 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-9081.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7523
[https://github.com/apache/spark/pull/7523]

 fillna/dropna should also fill/drop NaN values in addition to null values
 -

 Key: SPARK-9081
 URL: https://issues.apache.org/jira/browse/SPARK-9081
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Priority: Blocker
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8915) Add @since tags to mllib.classification

2015-07-21 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8915:
-
Assignee: Xiangrui Meng

 Add @since tags to mllib.classification
 ---

 Key: SPARK-8915
 URL: https://issues.apache.org/jira/browse/SPARK-8915
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor
  Labels: starter
 Fix For: 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8641) Native Spark Window Functions

2015-07-21 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635205#comment-14635205
 ] 

Herman van Hovell commented on SPARK-8641:
--

We need to wait for the new UDAF interface to stabilize. Special attention 
needs to be paid to following aspects:
* Hive UDAFs
* Difference in processing an AlgebraicAggregate, AggregateFunction2  
(potentially) AggregateFunction
* Common aggregate processing functionality.

 Native Spark Window Functions
 -

 Key: SPARK-8641
 URL: https://issues.apache.org/jira/browse/SPARK-8641
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Herman van Hovell

 The current Window implementation uses Hive UDAFs for all aggregation 
 operations. In this ticket we will move to this functionality to Native Spark 
 Expressions. The rationale for this is that although Hive UDAFs are very well 
 written, they remain opaque in processing and memory management; this makes 
 them hard to optimize.
 This ticket and its PR will build on the work being done in SPARK-4366.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-21 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated SPARK-9019:
--
Attachment: debug-log-spark-1.5-fail
spark-submit-log-1.5.0-fail

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn
 Attachments: debug-log-spark-1.5-fail, spark-submit-log-1.5.0-fail


 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at

[jira] [Resolved] (SPARK-9193) Avoid assigning tasks to executors under killing

2015-07-21 Thread Imran Rashid (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Imran Rashid resolved SPARK-9193.
-
Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7528
[https://github.com/apache/spark/pull/7528]

Avoid assigning tasks to executors under killing

Key: SPARK-9193
URL: https://issues.apache.org/jira/browse/SPARK-9193
Project: Spark
Issue Type: Bug
Components: Scheduler
Affects Versions: 1.4.0, 1.4.1
Reporter: Jie Huang
Assignee: Jie Huang
Fix For: 1.5.0

Now, when some executors are killed by dynamic-allocation, it leads to some
mis-assignment onto lost executors sometimes. Such kind of mis-assignment
causes task failure(s) or even job failure if it repeats that errors for 4
times.
The root cause is that killExecutors doesn't remove those executors under
killing ASAP. It depends on the OnDisassociated event to refresh the active
working list later. The delay time really depends on your cluster status
(from several milliseconds to sub-minute). When new tasks to be scheduled
during that period of time, it will be assigned to those active but under
killing executors. Then the tasks will be failed due to executor lost. The
better way is to exclude those executors under killing in the makeOffers().
Then all those tasks won't be allocated onto those executors to be lost any
more.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9220) Streaming K-means implementation exception while processing windowed stream

2015-07-21 Thread Iaroslav Zeigerman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635240#comment-14635240
 ] 

Iaroslav Zeigerman commented on SPARK-9220:
---

Looks like the issue reproduces only when training and test data streams are 
linked to the same directory. Can someone confirm if this cause the issue?

 Streaming K-means implementation exception while processing windowed stream
 ---

 Key: SPARK-9220
 URL: https://issues.apache.org/jira/browse/SPARK-9220
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Streaming
Affects Versions: 1.4.1
Reporter: Iaroslav Zeigerman

 Spark throws an exception when the Streaming K-means algorithm trains on a 
 windowed stream. The stream looks like following:
 {{val trainingSet = 
 ssc.textFileStream(TrainingDataSet).window(Seconds(30))...}}
 The exception occurs when there is no new data in the stream. Here is an 
 exception:
 15/07/21 17:36:08 ERROR JobScheduler: Error running job streaming job 
 1437489368000 ms.0
 java.lang.ArrayIndexOutOfBoundsException: 13
   at 
 org.apache.spark.mllib.clustering.StreamingKMeansModel$$anonfun$update$1.apply(StreamingKMeans.scala:105)
   at 
 org.apache.spark.mllib.clustering.StreamingKMeansModel$$anonfun$update$1.apply(StreamingKMeans.scala:102)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at 
 org.apache.spark.mllib.clustering.StreamingKMeansModel.update(StreamingKMeans.scala:102)
   at 
 org.apache.spark.mllib.clustering.StreamingKMeans$$anonfun$trainOn$1.apply(StreamingKMeans.scala:235)
   at 
 org.apache.spark.mllib.clustering.StreamingKMeans$$anonfun$trainOn$1.apply(StreamingKMeans.scala:234)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
   at 
 org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
   at 
 org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
   at scala.util.Try$.apply(Try.scala:161)
   at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
   at 
 org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
   at 
 org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
   at 
 org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
   at 
 org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 When the new data arrives the algorithm works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9168) Add nanvl expression

2015-07-21 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-9168.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7523
[https://github.com/apache/spark/pull/7523]

 Add nanvl expression
 

 Key: SPARK-9168
 URL: https://issues.apache.org/jira/browse/SPARK-9168
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Yijie Shen
 Fix For: 1.5.0


 Similar to Oracle's nanvl:
 nanvl(v1, v2)
 if v1 is NaN, returns v2; otherwise, returns v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9221) Support IntervalType in Range Frame

2015-07-21 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-9221:


 Summary: Support IntervalType in Range Frame
 Key: SPARK-9221
 URL: https://issues.apache.org/jira/browse/SPARK-9221
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.0
Reporter: Herman van Hovell


Support the IntervalType in window range frames, as mentioned in the conclusion 
of the databricks  blog 
[post|https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html]
 on window functions.

This actualy requires us to support Literals instead of Integer constants in 
Range Frames. The following things will have to be modified:
* org.apache.spark.sql.hive.HiveQl
* org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame
* org.apache.spark.sql.execution.Window
* org.apache.spark.sql.expressions.Window



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8915) Add @since tags to mllib.classification

2015-07-21 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8915:
-
Assignee: Patrick Baier  (was: Xiangrui Meng)

 Add @since tags to mllib.classification
 ---

 Key: SPARK-8915
 URL: https://issues.apache.org/jira/browse/SPARK-8915
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Patrick Baier
Priority: Minor
  Labels: starter
 Fix For: 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8915) Add @since tags to mllib.classification

2015-07-21 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8915:
-
Shepherd: DB Tsai

 Add @since tags to mllib.classification
 ---

 Key: SPARK-8915
 URL: https://issues.apache.org/jira/browse/SPARK-8915
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Patrick Baier
Priority: Minor
  Labels: starter
 Fix For: 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8922) Add @since tags to mllib.evaluation

2015-07-21 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8922:
-
Shepherd: Shuo Xiang

 Add @since tags to mllib.evaluation
 ---

 Key: SPARK-8922
 URL: https://issues.apache.org/jira/browse/SPARK-8922
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter
   Original Estimate: 1h
  Remaining Estimate: 1h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635300#comment-14635300
 ] 

Shivaram Venkataraman commented on SPARK-9121:
--

Yeah we can add `install-dev.sh` in Jenkins before dev/lint-r. One unfortunate 
thing is that we typically do a lint-check before we run the rest of the 
Jenkins tests (build, unit tests etc.) So it would be good to not have this be 
the other way around I guess

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9210) checkValidAggregateExpression() throws exceptions with bad error messages

2015-07-21 Thread Simeon Simeonov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simeon Simeonov updated SPARK-9210:
---
Description: 
When a result column in {{SELECT ... GROUP BY}} is neither one of the {{GROUP 
BY}} expressions nor uses an aggregation function, 
{{org.apache.spark.sql.catalyst.analysis.CheckAnalysis}} throws 
{{org.apache.spark.sql.AnalysisException}} with the message expression 
'_column expression_' is neither present in the group by, nor is it an 
aggregate function. Add to group by or wrap in first() if you don't care which 
value you get.

The remedy suggestion in the exception message is incorrect: the function name 
is {{first_value}}, not {{first}}.

  was:
When a result column in {{SELECT ... GROUP BY}} is neither one of the {{GROUP 
BY}} expressions nor uses an aggregation function, 
{{org.apache.spark.sql.catalyst.analysis.CheckAnalysis}} throws 
{{org.apache.spark.sql.AnalysisException}} with the message expression 
'_column expression_' is neither present in the group by, nor is it an 
aggregate function. Add to group by or wrap in first() if you don't care which 
value you get.

The remedy suggestion in the exception message incorrect: the function name is 
{{first_value}}, not {{first}}.


 checkValidAggregateExpression() throws exceptions with bad error messages
 -

 Key: SPARK-9210
 URL: https://issues.apache.org/jira/browse/SPARK-9210
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
 Environment: N/A
Reporter: Simeon Simeonov
Priority: Trivial

 When a result column in {{SELECT ... GROUP BY}} is neither one of the {{GROUP 
 BY}} expressions nor uses an aggregation function, 
 {{org.apache.spark.sql.catalyst.analysis.CheckAnalysis}} throws 
 {{org.apache.spark.sql.AnalysisException}} with the message expression 
 '_column expression_' is neither present in the group by, nor is it an 
 aggregate function. Add to group by or wrap in first() if you don't care 
 which value you get.
 The remedy suggestion in the exception message is incorrect: the function 
 name is {{first_value}}, not {{first}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8668) expr function to convert SQL expression into a Column

2015-07-21 Thread Joseph Batchik (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636180#comment-14636180
 ] 

Joseph Batchik commented on SPARK-8668:
---

Does this look like what you were thinking?

https://github.com/JDrit/spark/commit/7fcf18a11427709d403418da8d444b434c63

 expr function to convert SQL expression into a Column
 -

 Key: SPARK-8668
 URL: https://issues.apache.org/jira/browse/SPARK-8668
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 selectExpr uses the expression parser to parse a string expressions. would be 
 great to create an expr function in functions.scala/functions.py that 
 converts a string into an expression (or a list of expressions separated by 
 comma).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9241) Supporting multiple DISTINCT columns

2015-07-21 Thread Yin Huai (JIRA)

Yin Huai created SPARK-9241:
---

 Summary: Supporting multiple DISTINCT columns
 Key: SPARK-9241
 URL: https://issues.apache.org/jira/browse/SPARK-9241
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Priority: Critical


Right now the new aggregation code path only support a single distinct column 
(you can use it in multiple aggregate functions in the query). We need to 
support multiple distinct columns by generating a different plan for handling 
multiple distinct columns (without change aggregate functions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9242) Audit both built-in aggregate function and UDAF interface before 1.5.0 release

2015-07-21 Thread Yin Huai (JIRA)

Yin Huai created SPARK-9242:
---

 Summary: Audit both built-in aggregate function and UDAF interface 
before 1.5.0 release
 Key: SPARK-9242
 URL: https://issues.apache.org/jira/browse/SPARK-9242
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8668) expr function to convert SQL expression into a Column

2015-07-21 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636324#comment-14636324
 ] 

Reynold Xin commented on SPARK-8668:


Yes - the only thing is you cannot split blindly by comma since commas can be 
in quotes. I think it is ok for the first cut to not support list of 
expressions separated by comma.


 expr function to convert SQL expression into a Column
 -

 Key: SPARK-8668
 URL: https://issues.apache.org/jira/browse/SPARK-8668
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 selectExpr uses the expression parser to parse a string expressions. would be 
 great to create an expr function in functions.scala/functions.py that 
 converts a string into an expression (or a list of expressions separated by 
 comma).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9237) Added Top N Column Values for DataFrames

2015-07-21 Thread Ted Malaska (JIRA)

Ted Malaska created SPARK-9237:
--

 Summary: Added Top N Column Values for DataFrames
 Key: SPARK-9237
 URL: https://issues.apache.org/jira/browse/SPARK-9237
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Malaska
Priority: Minor


This jira is to add a very common data quality check into dataframes.

A quick outline of this functionality can be seen in the following blog post
http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/

There are two parts to this Jira.
1. How to implement the Top N Count.  Which I will start with the 
implementation in the blog
2. Where to add the function.  Ether straight off Dataframe, in Dataframe 
describe or in DataFrameStatFunctions.  I will start with putting it into 
DataFrameStatFunctions.

Please let me know if you have any input.

Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 260 matches

Mail list logo