Re: Workflow Scheduler for Spark

2014-09-28 Thread Egor Pahomov
I created Jira https://issues.apache.org/jira/browse/SPARK-3714 and design
doc
https://docs.google.com/document/d/1q2Q8Ux-6uAkH7wtLJpc3jz-GfrDEjlbWlXtf20hvguk/edit?usp=sharing
on
this matter.

2014-09-17 22:28 GMT+04:00 Reynold Xin r...@databricks.com:

 There might've been some misunderstanding. I was referring to the MLlib
 pipeline design doc when I said the design doc was posted, in response to
 the first paragraph of your original email.


 On Wed, Sep 17, 2014 at 2:47 AM, Egor Pahomov pahomov.e...@gmail.com
 wrote:

  It's doc about MLLib pipeline functionality. What about oozie-like
  workflow?
 
  2014-09-17 13:08 GMT+04:00 Mark Hamstra m...@clearstorydata.com:
 
   See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
   referenced in that JIRA:
  
  
  
 
 https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing
  
   On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov pahomov.e...@gmail.com
   wrote:
  
   I have problems using Oozie. For example it doesn't sustain spark
  context
   like ooyola job server does. Other than GUI interfaces like HUE it's
  hard
   to work with - scoozie stopped in development year ago(I spoke with
   creator) and oozie xml very hard to write.
   Oozie still have all documentation and code in MR model rather than in
   yarn
   model. And based on it's current speed of development I can't expect
   radical changes in nearest future. There is no Databricks for oozie,
   which would have people on salary to develop this kind of radical
  changes.
   It's dinosaur.
  
   Reunold, can you help finding this doc? Do you mean just pipelining
  spark
   code or additional logic of persistence tasks, job server, task retry,
   data
   availability and extra?
  
  
   2014-09-17 11:21 GMT+04:00 Reynold Xin r...@databricks.com:
  
Hi Egor,
   
I think the design doc for the pipeline feature has been posted.
   
For the workflow, I believe Oozie actually works fine with Spark if
  you
want some external workflow system. Do you have any trouble using
  that?
   
   
On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov 
  pahomov.e...@gmail.com
wrote:
   
There are two things we(Yandex) miss in Spark: MLlib good
  abstractions
   and
good workflow job scheduler. From threads Adding abstraction in
  MlLib
and
[mllib] State of Multi-Model training I got the idea, that
  databricks
working on it and we should wait until first post doc, which would
  lead
us.
What about workflow scheduler? Is there anyone already working on
 it?
   Does
anyone have a plan on doing it?
   
P.S. We thought that MLlib abstractions about multiple algorithms
 run
   with
same data would need such scheduler, which would rerun algorithm in
   case
of
failure. I understand, that spark provide fault tolerance out of
 the
   box,
but we found some Ooozie-like scheduler more reliable for such
 long
living workflows.
   
--
   
   
   
*Sincerely yoursEgor PakhomovScala Developer, Yandex*
   
   
   
  
  
   --
  
  
  
   *Sincerely yoursEgor PakhomovScala Developer, Yandex*
  
  
  
 
 
  --
 
 
 
  *Sincerely yoursEgor PakhomovScala Developer, Yandex*
 




-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*


Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-28 Thread Du Li
It turned out a bug in my code. In the select clause the list of fields is
misaligned with the schema of the target table. As a consequence the map
data couldn’t be cast to some other type in the schema.

Thanks anyway.


On 9/26/14, 8:08 PM, Cheng Lian lian.cs@gmail.com wrote:

Would you mind to provide the DDL of this partitioned table together
with the query you tried? The stacktrace suggests that the query was
trying to cast a map into something else, which is not supported in
Spark SQL. And I doubt whether Hive support casting a complex type to
some other type.

On 9/27/14 7:48 AM, Du Li wrote:
 Hi,

 I was loading data into a partitioned table on Spark 1.1.0
 beeline-thriftserver. The table has complex data types such as
mapstring,
 string and arraymapstring,string. The query is like ³insert
overwrite
 table a partition (Š) select Š² and the select clause worked if run
 separately. However, when running the insert query, there was an error
as
 follows.

 The source code of Cast.scala seems to only handle the primitive data
 types, which is perhaps why the MatchError was thrown.

 I just wonder if this is still work in progress, or I should do it
 differently.

 Thanks,
 Du


 
 scala.MatchError: MapType(StringType,StringType,true) (of class
 org.apache.spark.sql.catalyst.types.MapType)

 
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala
:2
 47)
  
org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
  
org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)

 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.sca
la
 :84)

 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap
pl
 y(Projection.scala:66)

 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap
pl
 y(Projection.scala:50)
  scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
  scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$
sq
 
l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.s
ca
 la:149)

 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi
ve
 File$1.apply(InsertIntoHiveTable.scala:158)

 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi
ve
 File$1.apply(InsertIntoHiveTable.scala:158)
  
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
  org.apache.spark.scheduler.Task.run(Task.scala:54)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
:1
 145)

 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
a:
 615)
  java.lang.Thread.run(Thread.java:722)






 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





view not supported in spark thrift server?

2014-09-28 Thread Du Li

Can anybody confirm whether or not view is currently supported in spark? I 
found “create view translate” in the blacklist of HiveCompatibilitySuite.scala 
and also the following scenario threw NullPointerException on 
beeline/thriftserver (1.1.0). Any plan to support it soon?


 create table src(k string, v string);

 load data local inpath 
 '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src;

 create view kv as select k, v from src;

 select * from kv;

Error: java.lang.NullPointerException (state=,code=0)


Re: view not supported in spark thrift server?

2014-09-28 Thread Michael Armbrust
Views are not supported yet.  Its not currently on the near term roadmap,
but that can change if there is sufficient demand or someone in the
community is interested in implementing them.  I do not think it would be
very hard.

Michael

On Sun, Sep 28, 2014 at 11:59 AM, Du Li l...@yahoo-inc.com.invalid wrote:


  Can anybody confirm whether or not view is currently supported in spark?
 I found “create view translate” in the blacklist of
 HiveCompatibilitySuite.scala and also the following scenario threw
 NullPointerException on beeline/thriftserver (1.1.0). Any plan to support
 it soon?

   create table src(k string, v string);

  load data local inpath
 '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src;

  create view kv as select k, v from src;

  select * from kv;

 Error: java.lang.NullPointerException (state=,code=0)



Re: view not supported in spark thrift server?

2014-09-28 Thread Du Li
Thanks, Michael, for your quick response.

View is critical for my project that is migrating from shark to spark SQL. I 
have implemented and tested everything else. It would be perfect if view could 
be implemented soon.

Du


From: Michael Armbrust mich...@databricks.commailto:mich...@databricks.com
Date: Sunday, September 28, 2014 at 12:13 PM
To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid
Cc: dev@spark.apache.orgmailto:dev@spark.apache.org 
dev@spark.apache.orgmailto:dev@spark.apache.org, 
u...@spark.apache.orgmailto:u...@spark.apache.org 
u...@spark.apache.orgmailto:u...@spark.apache.org
Subject: Re: view not supported in spark thrift server?

Views are not supported yet.  Its not currently on the near term roadmap, but 
that can change if there is sufficient demand or someone in the community is 
interested in implementing them.  I do not think it would be very hard.

Michael

On Sun, Sep 28, 2014 at 11:59 AM, Du Li 
l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote:

Can anybody confirm whether or not view is currently supported in spark? I 
found “create view translate” in the blacklist of HiveCompatibilitySuite.scala 
and also the following scenario threw NullPointerException on 
beeline/thriftserver (1.1.0). Any plan to support it soon?


 create table src(k string, v string);

 load data local inpath 
 '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src;

 create view kv as select k, v from src;

 select * from kv;

Error: java.lang.NullPointerException (state=,code=0)



Spark meetup on Oct 15 in NYC

2014-09-28 Thread Reynold Xin
Hi Spark users and developers,

Some of the most active Spark developers (including Matei Zaharia, Michael
Armbrust, Joseph Bradley, TD, Paco Nathan, and me) will be in NYC for
Strata NYC. We are working with the Spark NYC meetup group and Bloomberg to
host a meetup event. This might be the event with the highest committer to
user ratio in the history of user meetups. Look forward to meeting more
users in NYC.

You can sign up for that here:
http://www.meetup.com/Spark-NYC/events/209271842/

Cheers.