Re: Workflow Scheduler for Spark
I created Jira https://issues.apache.org/jira/browse/SPARK-3714 and design doc https://docs.google.com/document/d/1q2Q8Ux-6uAkH7wtLJpc3jz-GfrDEjlbWlXtf20hvguk/edit?usp=sharing on this matter. 2014-09-17 22:28 GMT+04:00 Reynold Xin r...@databricks.com: There might've been some misunderstanding. I was referring to the MLlib pipeline design doc when I said the design doc was posted, in response to the first paragraph of your original email. On Wed, Sep 17, 2014 at 2:47 AM, Egor Pahomov pahomov.e...@gmail.com wrote: It's doc about MLLib pipeline functionality. What about oozie-like workflow? 2014-09-17 13:08 GMT+04:00 Mark Hamstra m...@clearstorydata.com: See https://issues.apache.org/jira/browse/SPARK-3530 and this doc, referenced in that JIRA: https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov pahomov.e...@gmail.com wrote: I have problems using Oozie. For example it doesn't sustain spark context like ooyola job server does. Other than GUI interfaces like HUE it's hard to work with - scoozie stopped in development year ago(I spoke with creator) and oozie xml very hard to write. Oozie still have all documentation and code in MR model rather than in yarn model. And based on it's current speed of development I can't expect radical changes in nearest future. There is no Databricks for oozie, which would have people on salary to develop this kind of radical changes. It's dinosaur. Reunold, can you help finding this doc? Do you mean just pipelining spark code or additional logic of persistence tasks, job server, task retry, data availability and extra? 2014-09-17 11:21 GMT+04:00 Reynold Xin r...@databricks.com: Hi Egor, I think the design doc for the pipeline feature has been posted. For the workflow, I believe Oozie actually works fine with Spark if you want some external workflow system. Do you have any trouble using that? On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov pahomov.e...@gmail.com wrote: There are two things we(Yandex) miss in Spark: MLlib good abstractions and good workflow job scheduler. From threads Adding abstraction in MlLib and [mllib] State of Multi-Model training I got the idea, that databricks working on it and we should wait until first post doc, which would lead us. What about workflow scheduler? Is there anyone already working on it? Does anyone have a plan on doing it? P.S. We thought that MLlib abstractions about multiple algorithms run with same data would need such scheduler, which would rerun algorithm in case of failure. I understand, that spark provide fault tolerance out of the box, but we found some Ooozie-like scheduler more reliable for such long living workflows. -- *Sincerely yoursEgor PakhomovScala Developer, Yandex* -- *Sincerely yoursEgor PakhomovScala Developer, Yandex* -- *Sincerely yoursEgor PakhomovScala Developer, Yandex* -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*
Re: SparkSQL: map type MatchError when inserting into Hive table
It turned out a bug in my code. In the select clause the list of fields is misaligned with the schema of the target table. As a consequence the map data couldn’t be cast to some other type in the schema. Thanks anyway. On 9/26/14, 8:08 PM, Cheng Lian lian.cs@gmail.com wrote: Would you mind to provide the DDL of this partitioned table together with the query you tried? The stacktrace suggests that the query was trying to cast a map into something else, which is not supported in Spark SQL. And I doubt whether Hive support casting a complex type to some other type. On 9/27/14 7:48 AM, Du Li wrote: Hi, I was loading data into a partitioned table on Spark 1.1.0 beeline-thriftserver. The table has complex data types such as mapstring, string and arraymapstring,string. The query is like ³insert overwrite table a partition (Š) select Š² and the select clause worked if run separately. However, when running the insert query, there was an error as follows. The source code of Cast.scala seems to only handle the primitive data types, which is perhaps why the MatchError was thrown. I just wonder if this is still work in progress, or I should do it differently. Thanks, Du scala.MatchError: MapType(StringType,StringType,true) (of class org.apache.spark.sql.catalyst.types.MapType) org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala :2 47) org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.sca la :84) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap pl y(Projection.scala:66) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap pl y(Projection.scala:50) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$ sq l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.s ca la:149) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi ve File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi ve File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java :1 145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav a: 615) java.lang.Thread.run(Thread.java:722) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
view not supported in spark thrift server?
Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? create table src(k string, v string); load data local inpath '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src; create view kv as select k, v from src; select * from kv; Error: java.lang.NullPointerException (state=,code=0)
Re: view not supported in spark thrift server?
Views are not supported yet. Its not currently on the near term roadmap, but that can change if there is sufficient demand or someone in the community is interested in implementing them. I do not think it would be very hard. Michael On Sun, Sep 28, 2014 at 11:59 AM, Du Li l...@yahoo-inc.com.invalid wrote: Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? create table src(k string, v string); load data local inpath '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src; create view kv as select k, v from src; select * from kv; Error: java.lang.NullPointerException (state=,code=0)
Re: view not supported in spark thrift server?
Thanks, Michael, for your quick response. View is critical for my project that is migrating from shark to spark SQL. I have implemented and tested everything else. It would be perfect if view could be implemented soon. Du From: Michael Armbrust mich...@databricks.commailto:mich...@databricks.com Date: Sunday, September 28, 2014 at 12:13 PM To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid Cc: dev@spark.apache.orgmailto:dev@spark.apache.org dev@spark.apache.orgmailto:dev@spark.apache.org, u...@spark.apache.orgmailto:u...@spark.apache.org u...@spark.apache.orgmailto:u...@spark.apache.org Subject: Re: view not supported in spark thrift server? Views are not supported yet. Its not currently on the near term roadmap, but that can change if there is sufficient demand or someone in the community is interested in implementing them. I do not think it would be very hard. Michael On Sun, Sep 28, 2014 at 11:59 AM, Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote: Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? create table src(k string, v string); load data local inpath '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src; create view kv as select k, v from src; select * from kv; Error: java.lang.NullPointerException (state=,code=0)
Spark meetup on Oct 15 in NYC
Hi Spark users and developers, Some of the most active Spark developers (including Matei Zaharia, Michael Armbrust, Joseph Bradley, TD, Paco Nathan, and me) will be in NYC for Strata NYC. We are working with the Spark NYC meetup group and Bloomberg to host a meetup event. This might be the event with the highest committer to user ratio in the history of user meetups. Look forward to meeting more users in NYC. You can sign up for that here: http://www.meetup.com/Spark-NYC/events/209271842/ Cheers.