[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2016-03-01 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174807#comment-15174807
 ] 

Jaka Jancar commented on SPARK-7768:


[~randallwhitman] UDT, not UDF: 
https://github.com/apache/spark/blob/v1.6.0/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Priority: Critical
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13480) Regression with percentile() + function in GROUP BY

2016-02-24 Thread Jaka Jancar (JIRA)
Jaka Jancar created SPARK-13480:
---

 Summary: Regression with percentile() + function in GROUP BY
 Key: SPARK-13480
 URL: https://issues.apache.org/jira/browse/SPARK-13480
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jaka Jancar


{code}
SELECT
  percentile(load_time, 0.50)
FROM
  (
select '2000-01-01' queued_at, 100 load_time
union all
select '2000-01-01' queued_at, 110 load_time
union all
select '2000-01-01' queued_at, 120 load_time
  ) t
GROUP BY
  year(queued_at)
{code}

fails with

{code}
Error in SQL statement: SparkException: Job aborted due to stage failure: Task 
0 in stage 6067.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
6067.0 (TID 268774, ip-10-0-163-203.ec2.internal): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: year(cast(queued_at#78201 as date))#78209
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:242)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:233)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.(Projection.scala:62)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.Exchange.org$apache$spark$sql$execution$Exchange$$getPartitionKeyExtractor$1(Exchange.scala:197)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:209)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:208)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Couldn't find year(cast(queued_at#78201 
as date))#78209 in [queued_at#78201,load_time#78202]
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:92)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:86)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 33 more
{code}

This used to work (not sure whether on 1.5 on 1.4).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (SPARK-11966) Spark API for UDTFs

2016-01-11 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093174#comment-15093174
 ] 

Jaka Jancar commented on SPARK-11966:
-

[~marmbrus] [~rlgarris_databricks] Any chance of getting this on the roadmap? 
Or, can you suggest a workaround that does not require the user to specify the 
column names in SQL (but instead have it come from the UDTF)?

> Spark API for UDTFs
> ---
>
> Key: SPARK-11966
> URL: https://issues.apache.org/jira/browse/SPARK-11966
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Jaka Jancar
>Priority: Minor
>
> Defining UDFs is easy using sqlContext.udf.register, but not table-generating 
> functions. For those you still have to use these horrendous Hive interfaces:
> https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12401) Add support for enums in postgres

2015-12-17 Thread Jaka Jancar (JIRA)
Jaka Jancar created SPARK-12401:
---

 Summary: Add support for enums in postgres
 Key: SPARK-12401
 URL: https://issues.apache.org/jira/browse/SPARK-12401
 Project: Spark
  Issue Type: New Feature
Affects Versions: 1.6.0
Reporter: Jaka Jancar


JSON and JSONB types [are now 
converted|https://github.com/apache/spark/pull/8948/files] into strings on the 
Spark side instead of throwing. It would be great it [enumerated 
types|http://www.postgresql.org/docs/current/static/datatype-enum.html] were 
treated similarly instead of failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11966) Spark API for UDTFs

2015-11-30 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032906#comment-15032906
 ] 

Jaka Jancar commented on SPARK-11966:
-

Not sure I understand. I would like to do {{SELECT * FROM 
my_create_table(...)}}.

Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}.


> Spark API for UDTFs
> ---
>
> Key: SPARK-11966
> URL: https://issues.apache.org/jira/browse/SPARK-11966
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Jaka Jancar
>Priority: Minor
>
> Defining UDFs is easy using sqlContext.udf.register, but not table-generating 
> functions. For those you still have to use these horrendous Hive interfaces:
> https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11966) Spark API for UDTFs

2015-11-30 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032906#comment-15032906
 ] 

Jaka Jancar edited comment on SPARK-11966 at 12/1/15 1:59 AM:
--

Not sure I understand. I would like to do {{SELECT * FROM 
my_create_table(...)}}.

Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}.

//edit: In reality, this would be a part of JOIN or lateral view. I would like 
it to be doable with only SQL.


was (Author: jakajancar):
Not sure I understand. I would like to do {{SELECT * FROM 
my_create_table(...)}}.

Right now, all I can do is {{SELECT * FROM explode(my_create_array(...))}}.


> Spark API for UDTFs
> ---
>
> Key: SPARK-11966
> URL: https://issues.apache.org/jira/browse/SPARK-11966
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Jaka Jancar
>Priority: Minor
>
> Defining UDFs is easy using sqlContext.udf.register, but not table-generating 
> functions. For those you still have to use these horrendous Hive interfaces:
> https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11966) Spark API for UDTFs

2015-11-24 Thread Jaka Jancar (JIRA)
Jaka Jancar created SPARK-11966:
---

 Summary: Spark API for UDTFs
 Key: SPARK-11966
 URL: https://issues.apache.org/jira/browse/SPARK-11966
 Project: Spark
  Issue Type: New Feature
Reporter: Jaka Jancar


Defining UDFs is easy using sqlContext.udf.register, but not table-generating 
functions. For those you still have to use these horrendous Hive interfaces:

https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11966) Spark API for UDTFs

2015-11-24 Thread Jaka Jancar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaka Jancar updated SPARK-11966:

Priority: Minor  (was: Major)

> Spark API for UDTFs
> ---
>
> Key: SPARK-11966
> URL: https://issues.apache.org/jira/browse/SPARK-11966
> Project: Spark
>  Issue Type: New Feature
>Reporter: Jaka Jancar
>Priority: Minor
>
> Defining UDFs is easy using sqlContext.udf.register, but not table-generating 
> functions. For those you still have to use these horrendous Hive interfaces:
> https://github.com/prongs/apache-hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10171) AWS Lambda Executors

2015-08-22 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708175#comment-14708175
 ] 

Jaka Jancar edited comment on SPARK-10171 at 8/22/15 9:03 PM:
--

You can start a task via HTTP in a synchronous (response on completion) or 
asynchronous way.

Not sure I understand how this doesn't fit into the Spark model. Seems like the 
ideal cluster to me :)


was (Author: jakajancar):
You can start a task via a HTTP in a synchronous (response on completion) or 
asynchronous way.

Not sure I understand how this doesn't fit into the Spark model. Seems like the 
ideal cluster to me :)

 AWS Lambda Executors
 

 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
Reporter: Jaka Jancar
Priority: Minor

 It would be great if Spark supported using AWS Lambda for execution in 
 addition to Standalone, Mesos and YARN, getting rid of the concept of a 
 cluster and having a single infinite-sized one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10171) AWS Lambda Executors

2015-08-22 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708175#comment-14708175
 ] 

Jaka Jancar commented on SPARK-10171:
-

You can start a task via a HTTP in a synchronous (response on completion) or 
asynchronous way.

Not sure I understand how this doesn't fit into the Spark model. Seems like the 
ideal cluster to me :)

 AWS Lambda Executors
 

 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
Reporter: Jaka Jancar
Priority: Minor

 It would be great if Spark supported using AWS Lambda for execution in 
 addition to Standalone, Mesos and YARN, getting rid of the concept of a 
 cluster and having a single infinite-sized one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10171) AWS Lambda Executors

2015-08-22 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708188#comment-14708188
 ] 

Jaka Jancar commented on SPARK-10171:
-

Oh sure, we have no problems running Spark today. But embrace the future, where 
we rent containers by the second, not VMs by the hour :)

 AWS Lambda Executors
 

 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
Reporter: Jaka Jancar
Priority: Minor

 It would be great if Spark supported using AWS Lambda for execution in 
 addition to Standalone, Mesos and YARN, getting rid of the concept of a 
 cluster and having a single infinite-sized one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10171) AWS Lambda Executors

2015-08-22 Thread Jaka Jancar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaka Jancar updated SPARK-10171:

Description: 
It would be great if Spark supported using AWS Lambda for execution in addition 
to Standalone, Mesos and YARN, getting rid of the concept of a cluster and 
having a single infinite-sized one.

Couple of problems I see today:
  - Execution time is limited to 60s. This will probably change in the future.
  - Burstiness is still not very high.


  was:It would be great if Spark supported using AWS Lambda for execution in 
addition to Standalone, Mesos and YARN, getting rid of the concept of a 
cluster and having a single infinite-sized one.


 AWS Lambda Executors
 

 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
Reporter: Jaka Jancar
Priority: Minor

 It would be great if Spark supported using AWS Lambda for execution in 
 addition to Standalone, Mesos and YARN, getting rid of the concept of a 
 cluster and having a single infinite-sized one.
 Couple of problems I see today:
   - Execution time is limited to 60s. This will probably change in the future.
   - Burstiness is still not very high.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10171) AWS Lambda Executors

2015-08-22 Thread Jaka Jancar (JIRA)
Jaka Jancar created SPARK-10171:
---

 Summary: AWS Lambda Executors
 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
Reporter: Jaka Jancar
Priority: Minor


It would be great if Spark supported using AWS Lambda for execution in addition 
to Standalone, Mesos and YARN, getting rid of the concept of a cluster and 
having a single infinite-sized one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8156) Respect current database when creating datasource tables

2015-06-22 Thread Jaka Jancar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597015#comment-14597015
 ] 

Jaka Jancar commented on SPARK-8156:


Can this be backported into 1.4?
I can prepare a pull request, if needed.

 Respect current database when creating datasource tables
 

 Key: SPARK-8156
 URL: https://issues.apache.org/jira/browse/SPARK-8156
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: baishuo
Assignee: baishuo
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org