[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174807#comment-15174807 ] Jaka Jancar commented on SPARK-7768: [~randallwhitman] UDT, not UDF: https://github.com/apache/spark/blob/v1.6.0/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Priority: Critical > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13480) Regression with percentile() + function in GROUP BY
Jaka Jancar created SPARK-13480: --- Summary: Regression with percentile() + function in GROUP BY Key: SPARK-13480 URL: https://issues.apache.org/jira/browse/SPARK-13480 Project: Spark Issue Type: Bug Affects Versions: 1.6.0 Reporter: Jaka Jancar {code} SELECT percentile(load_time, 0.50) FROM ( select '2000-01-01' queued_at, 100 load_time union all select '2000-01-01' queued_at, 110 load_time union all select '2000-01-01' queued_at, 120 load_time ) t GROUP BY year(queued_at) {code} fails with {code} Error in SQL statement: SparkException: Job aborted due to stage failure: Task 0 in stage 6067.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6067.0 (TID 268774, ip-10-0-163-203.ec2.internal): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: year(cast(queued_at#78201 as date))#78209 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:242) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:233) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.(Projection.scala:62) at org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.Exchange.org$apache$spark$sql$execution$Exchange$$getPartitionKeyExtractor$1(Exchange.scala:197) at org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:209) at org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:208) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Couldn't find year(cast(queued_at#78201 as date))#78209 in [queued_at#78201,load_time#78202] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:92) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:86) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) ... 33 more {code} This used to work (not sure whether on 1.5 on 1.4). -- This message was sent by Atlassian JIRA (v6.3.4#6332) -
[jira] [Commented] (SPARK-11966) Spark API for UDTFs
[ https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093174#comment-15093174 ] Jaka Jancar commented on SPARK-11966: - [~marmbrus] [~rlgarris_databricks] Any chance of getting this on the roadmap? Or, can you suggest a workaround that does not require the user to specify the column names in SQL (but instead have it come from the UDTF)? > Spark API for UDTFs > --- > > Key: SPARK-11966 > URL: https://issues.apache.org/jira/browse/SPARK-11966 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Jaka Jancar >Priority: Minor > > Defining UDFs is easy using sqlContext.udf.register, but not table-generating > functions. For those you still have to use these horrendous Hive interfaces: > https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12401) Add support for enums in postgres
Jaka Jancar created SPARK-12401: --- Summary: Add support for enums in postgres Key: SPARK-12401 URL: https://issues.apache.org/jira/browse/SPARK-12401 Project: Spark Issue Type: New Feature Affects Versions: 1.6.0 Reporter: Jaka Jancar JSON and JSONB types [are now converted|https://github.com/apache/spark/pull/8948/files] into strings on the Spark side instead of throwing. It would be great it [enumerated types|http://www.postgresql.org/docs/current/static/datatype-enum.html] were treated similarly instead of failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11966) Spark API for UDTFs
[ https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032906#comment-15032906 ] Jaka Jancar commented on SPARK-11966: - Not sure I understand. I would like to do {{SELECT * FROM my_create_table(...)}}. Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}. > Spark API for UDTFs > --- > > Key: SPARK-11966 > URL: https://issues.apache.org/jira/browse/SPARK-11966 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Jaka Jancar >Priority: Minor > > Defining UDFs is easy using sqlContext.udf.register, but not table-generating > functions. For those you still have to use these horrendous Hive interfaces: > https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11966) Spark API for UDTFs
[ https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032906#comment-15032906 ] Jaka Jancar edited comment on SPARK-11966 at 12/1/15 1:59 AM: -- Not sure I understand. I would like to do {{SELECT * FROM my_create_table(...)}}. Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}. //edit: In reality, this would be a part of JOIN or lateral view. I would like it to be doable with only SQL. was (Author: jakajancar): Not sure I understand. I would like to do {{SELECT * FROM my_create_table(...)}}. Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}. > Spark API for UDTFs > --- > > Key: SPARK-11966 > URL: https://issues.apache.org/jira/browse/SPARK-11966 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Jaka Jancar >Priority: Minor > > Defining UDFs is easy using sqlContext.udf.register, but not table-generating > functions. For those you still have to use these horrendous Hive interfaces: > https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11966) Spark API for UDTFs
Jaka Jancar created SPARK-11966: --- Summary: Spark API for UDTFs Key: SPARK-11966 URL: https://issues.apache.org/jira/browse/SPARK-11966 Project: Spark Issue Type: New Feature Reporter: Jaka Jancar Defining UDFs is easy using sqlContext.udf.register, but not table-generating functions. For those you still have to use these horrendous Hive interfaces: https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11966) Spark API for UDTFs
[ https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaka Jancar updated SPARK-11966: Priority: Minor (was: Major) > Spark API for UDTFs > --- > > Key: SPARK-11966 > URL: https://issues.apache.org/jira/browse/SPARK-11966 > Project: Spark > Issue Type: New Feature >Reporter: Jaka Jancar >Priority: Minor > > Defining UDFs is easy using sqlContext.udf.register, but not table-generating > functions. For those you still have to use these horrendous Hive interfaces: > https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10171) AWS Lambda Executors
[ https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708175#comment-14708175 ] Jaka Jancar edited comment on SPARK-10171 at 8/22/15 9:03 PM: -- You can start a task via HTTP in a synchronous (response on completion) or asynchronous way. Not sure I understand how this doesn't fit into the Spark model. Seems like the ideal cluster to me :) was (Author: jakajancar): You can start a task via a HTTP in a synchronous (response on completion) or asynchronous way. Not sure I understand how this doesn't fit into the Spark model. Seems like the ideal cluster to me :) AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10171) AWS Lambda Executors
[ https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708175#comment-14708175 ] Jaka Jancar commented on SPARK-10171: - You can start a task via a HTTP in a synchronous (response on completion) or asynchronous way. Not sure I understand how this doesn't fit into the Spark model. Seems like the ideal cluster to me :) AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10171) AWS Lambda Executors
[ https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708188#comment-14708188 ] Jaka Jancar commented on SPARK-10171: - Oh sure, we have no problems running Spark today. But embrace the future, where we rent containers by the second, not VMs by the hour :) AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10171) AWS Lambda Executors
[ https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaka Jancar updated SPARK-10171: Description: It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. Couple of problems I see today: - Execution time is limited to 60s. This will probably change in the future. - Burstiness is still not very high. was:It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. Couple of problems I see today: - Execution time is limited to 60s. This will probably change in the future. - Burstiness is still not very high. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10171) AWS Lambda Executors
Jaka Jancar created SPARK-10171: --- Summary: AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8156) Respect current database when creating datasource tables
[ https://issues.apache.org/jira/browse/SPARK-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597015#comment-14597015 ] Jaka Jancar commented on SPARK-8156: Can this be backported into 1.4? I can prepare a pull request, if needed. Respect current database when creating datasource tables Key: SPARK-8156 URL: https://issues.apache.org/jira/browse/SPARK-8156 Project: Spark Issue Type: Bug Components: SQL Reporter: baishuo Assignee: baishuo Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org