Re: Spark on Kudu

2016-06-20 Thread Benjamin Kim
Dan, Out of curiosity, I was looking through the spark-csv code in Github and tried to see what makes it work for the “CREATE TABLE” statement, while it doesn’t for spark-kudu. There are differences in the way both are done, CsvRelation vs. KuduRelation. I’m still learning how this works

Re: Spark on Kudu

2016-06-17 Thread Benjamin Kim
o your first question about `CREATE TABLE` syntax with Kudu/Spark SQL, I do > not think we support that at this point. I haven't looked deeply into it, > but we may hit issues specifying Kudu-specific options (partitioning, column > encoding, etc.). Probably issues that can be worked

Re: Spark on Kudu

2016-06-17 Thread Dan Burkert
Hi Ben, To your first question about `CREATE TABLE` syntax with Kudu/Spark SQL, I do not think we support that at this point. I haven't looked deeply into it, but we may hit issues specifying Kudu-specific options (partitioning, column encoding, etc.). Probably issues that can be worked through

Re: Spark on Kudu

2016-06-17 Thread Benjamin Kim
java.util.UUID val generateUUID = udf(() => UUID.randomUUID().toString) This is what I am using. I know auto incrementing is coming down the line (don’t know when), but is there a way to simulate this in Kudu using Spark out of curiosity? Thanks, Ben > On Jun 14, 2016, at 6:08 PM, Dan Burk

Re: Spark on Kudu

2016-06-15 Thread Benjamin Kim
gt;>>> try: >>>>> >>>>> import org.kududb.client._ >>>>> and try again? >>>>> >>>>> - Dan >>>>> >>>>> On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim <bbuil...@gmail.com

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
; import org.kududb.client._ >>> >>> and try again? >>> >>> - Dan >>> >>> On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim <bbuil...@gmail.com> >>> wrote: >>> >>>> I encountered an error trying to

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
gt; wrote: >> >>> I encountered an error trying to create a table based on the >>> documentation from a DataFrame. >>> >>> :49: error: not found: type CreateTableOptions >>> kuduContext.createTable(tableName, df.schema, Seq("key&q

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
teTableOptions >>> kuduContext.createTable(tableName, df.schema, Seq("key"), new >>> CreateTableOptions().setNumReplicas(1)) >>> >>> Is there something I’m missing? >>> >>> Thanks, >>> Ben >>> >>>

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
>>> It's only in Cloudera's maven repo: >>> https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/ >>> >>> <https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/> >>> >>&g

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
rote: >> >> It's only in Cloudera's maven repo: >> https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/ >> >> J-D >> >> On Tue, Jun 14, 2016 at 2:59 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>>

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
lt;https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/> >> >> J-D >> >> On Tue, Jun 14, 2016 at 2:59 PM, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> Hi J-D, >> >> I ins

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
.@gmail.com> wrote: > >> Hi J-D, >> >> I installed Kudu 0.9.0 using CM, but I can’t find the kudu-spark jar for >> spark-shell to use. Can you show me where to find it? >> >> Thanks, >> Ben >> >> >> On Jun 8, 2016, at 1:19 PM, Jean

Re: Spark on Kudu

2016-06-08 Thread Jean-Daniel Cryans
What's in this doc is what's gonna get released: https://github.com/cloudera/kudu/blob/master/docs/developing.adoc#kudu-integration-with-spark J-D On Tue, Jun 7, 2016 at 8:52 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > Will this be documented with examples once 0.9.0 comes out? &

Re: Spark on Kudu

2016-05-28 Thread Jean-Daniel Cryans
n updatable data store >>>>>>> for a long time that can be quickly queried directly either using Spark >>>>>>> SQL >>>>>>> or Impala or some other SQL engine and still handle TB or PB of data >>>>>>> without per

Re: Spark on Kudu

2016-05-28 Thread Benjamin Kim
to remain unchanged from >>> then on, and a new set of preliminary values for the current window need to >>> be added/appended. >>> >>> Using Kudu's Java API and developing additional functionality on top of >>> what Kudu has to offer isn'

Re: Spark on Kudu

2016-05-18 Thread Chris George
, Mark Hamstra <m...@clearstorydata.com<mailto:m...@clearstorydata.com>> wrote: Do they care being able to insert into Kudu with SparkSQL I care about insert into Kudu with Spark SQL. I'm currently delaying a refactoring of some Spark SQL-oriented insert functionality while trying to e

Re: Spark on Kudu

2016-05-18 Thread Benjamin Kim
SQL will gate how quickly we would move to using Kudu and how >> seriously we'd look at alternatives before making that decision. >> >> On Mon, Apr 11, 2016 at 8:14 AM, Jean-Daniel Cryans <jdcry...@apache.org >> <mailto:jdcry...@apache.org>> wrote: >>

Re: Spark on Kudu

2016-04-13 Thread Benjamin Kim
du and how >> seriously we'd look at alternatives before making that decision. >> >> On Mon, Apr 11, 2016 at 8:14 AM, Jean-Daniel Cryans <jdcry...@apache.org >> <mailto:jdcry...@apache.org>> wrote: >> Mark, >> >> Thanks for taking some

Re: Spark on Kudu

2016-04-12 Thread Benjamin Kim
t;> wrote: > Mark, > > Thanks for taking some time to reply in this thread, glad it caught the > attention of other folks! > > On Sun, Apr 10, 2016 at 12:33 PM, Mark Hamstra <m...@clearstorydata.com > <mailto:m...@clearstorydata.com>> wrote: > Do they care bein

Re: Spark on Kudu

2016-04-11 Thread Jean-Daniel Cryans
Ben, Thanks for the additional information. You know, I was expecting that querying would be the most important part and writing into Kudu was secondary since it can easily be done with the Java API, but you guys are proving me wrong. I'm starting to think we should host a Spark + Kudu hackathon

Re: Spark on Kudu

2016-04-11 Thread Jean-Daniel Cryans
Mark, Thanks for taking some time to reply in this thread, glad it caught the attention of other folks! On Sun, Apr 10, 2016 at 12:33 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Do they care being able to insert into Kudu with SparkSQL > > > I care about insert into K

Re: Spark on Kudu

2016-04-10 Thread Benjamin Kim
;> >>> Mobile: +1 818 635 2900 <tel:%2B1%20818%20635%202900> >>> 3250 Ocean Park Blvd, Suite 200 | Santa Monica, CA 90405 | >>> www.amobee.com <http://www.amobee.com/> >>> >>>> On Feb 24, 2016, at 3:51 PM, Jean-Daniel Cryans &l

Re: Spark on Kudu

2016-03-01 Thread Benjamin Kim
basic unit tests that others can easily extend. None of us on the team are Spark experts, but we'd be really happy to assist one improve the kudu-spark code. J-D On Wed, Feb 24, 2016 at 3:41 PM, Benjamin Kim <bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote: J-D, It looks like

Re: Spark on Kudu

2016-02-24 Thread Jean-Daniel Cryans
to contribute to. We have some basic unit tests that others can easily extend. None of us on the team are Spark experts, but we'd be really happy to assist one improve the kudu-spark code. J-D On Wed, Feb 24, 2016 at 3:41 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > J-D, > >

Re: Spark on Kudu

2016-02-24 Thread Benjamin Kim
Spark with Kudu and compare it to HBase with Spark (not clean). Thanks, Ben > On Feb 24, 2016, at 3:10 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > > AFAIK no one is working on it, but we did manage to get this in for 0.7.0: > https://issues.cloudera.org/browse/K

Re: Spark on Kudu

2016-02-24 Thread Jean-Daniel Cryans
AFAIK no one is working on it, but we did manage to get this in for 0.7.0: https://issues.cloudera.org/browse/KUDU-1321 It's a really simple wrapper, and yes you can use SparkSQL on Kudu, but it will require a lot more work to make it fast/useful. Hope this helps, J-D On Wed, Feb 24, 2016 at

Spark on Kudu

2016-02-24 Thread Benjamin Kim
I see this KUDU-1214 targeted for 0.8.0, but I see no progress on it. When this is complete, will this mean that Spark will be able to work with Kudu both programmatically and as a client via Spark SQL? Or is there more work that needs to be done