Re: CDH, Impala, Impala KUDU Versioning

2016-06-08 Thread Jean-Daniel Cryans
That's actually a question for Cloudera, so moving user@ to bcc and adding
cdh-user@ back.

On Wed, Jun 8, 2016 at 5:13 PM, Ana Krasteva  wrote:

>
> ​Forwarding this to Kudu user group.
>
> ​
>
> On Wed, Jun 8, 2016 at 5:09 PM, Pavan Kulkarni 
> wrote:
>
>> Hi,
>>
>>I am using the latest CDH available 5.7 (
>> https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7/RPMS/x86_64/)
>> and also trying to setup
>> Impala-Kudu (
>> http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/0.8/RPMS/x86_64/
>> )
>> and
>> KUDU (
>> http://archive.cloudera.com/beta/kudu/redhat/6/x86_64/kudu/0/RPMS/x86_64/
>> )
>>
>> but I see that latest Impala-KUDU is built with CDH 5.8 and latest KUDU
>> is built with CDH 5.4
>>
>> *1. Can I know what exactly is the process/reasoning behind building
>>  Impala-KUDU and KUDU with different CDH?*
>>
>
It's not "built" with CDH it's more like "built using the CDH build tools"
as of a certain version. It's really an implementation detail, and it has
no impact whatsoever on the version of CDH you're actually using, as long
as it's 5.4.3 or later.


> *2. How do I decide on which is the stable CDH to use since different
>> rpm's are built with different CDH ?*
>>
>
See 1, and read
http://www.cloudera.com/documentation/betas/kudu/latest/topics/kudu_installation.html#concept_cmn_ngq_dt


>
>> Any response is really appreciated.
>>
>> -- Thanks
>> Pavan Kulkarni
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "CDH Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cdh-user+unsubscr...@cloudera.org.
>> For more options, visit https://groups.google.com/a/cloudera.org/d/optout
>> .
>>
>
>


Re: CDH, Impala, Impala KUDU Versioning

2016-06-08 Thread Ana Krasteva
​Forwarding this to Kudu user group.

​

On Wed, Jun 8, 2016 at 5:09 PM, Pavan Kulkarni 
wrote:

> Hi,
>
>I am using the latest CDH available 5.7 (
> https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7/RPMS/x86_64/)
> and also trying to setup
> Impala-Kudu (
> http://archive.cloudera.com/beta/impala-kudu/redhat/6/x86_64/impala-kudu/0.8/RPMS/x86_64/
> )
> and
> KUDU (
> http://archive.cloudera.com/beta/kudu/redhat/6/x86_64/kudu/0/RPMS/x86_64/)
>
> but I see that latest Impala-KUDU is built with CDH 5.8 and latest KUDU is
> built with CDH 5.4
>
> *1. Can I know what exactly is the process/reasoning behind building
>  Impala-KUDU and KUDU with different CDH?*
> *2. How do I decide on which is the stable CDH to use since different
> rpm's are built with different CDH ?*
>
> Any response is really appreciated.
>
> -- Thanks
> Pavan Kulkarni
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdh-user+unsubscr...@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.
>


Re: Spark on Kudu

2016-06-08 Thread Jean-Daniel Cryans
What's in this doc is what's gonna get released:
https://github.com/cloudera/kudu/blob/master/docs/developing.adoc#kudu-integration-with-spark

J-D

On Tue, Jun 7, 2016 at 8:52 PM, Benjamin Kim  wrote:

> Will this be documented with examples once 0.9.0 comes out?
>
> Thanks,
> Ben
>
>
> On May 28, 2016, at 3:22 PM, Jean-Daniel Cryans 
> wrote:
>
> It will be in 0.9.0.
>
> J-D
>
> On Sat, May 28, 2016 at 8:31 AM, Benjamin Kim  wrote:
>
>> Hi Chris,
>>
>> Will all this effort be rolled into 0.9.0 and be ready for use?
>>
>> Thanks,
>> Ben
>>
>>
>> On May 18, 2016, at 9:01 AM, Chris George 
>> wrote:
>>
>> There is some code in review that needs some more refinement.
>> It will allow upsert/insert from a dataframe using the datasource api. It
>> will also allow the creation and deletion of tables from a dataframe
>> http://gerrit.cloudera.org:8080/#/c/2992/
>>
>> Example usages will look something like:
>> http://gerrit.cloudera.org:8080/#/c/2992/5/docs/developing.adoc
>>
>> -Chris George
>>
>>
>> On 5/18/16, 9:45 AM, "Benjamin Kim"  wrote:
>>
>> Can someone tell me what the state is of this Spark work?
>>
>> Also, does anyone have any sample code on how to update/insert data in
>> Kudu using DataFrames?
>>
>> Thanks,
>> Ben
>>
>>
>> On Apr 13, 2016, at 8:22 AM, Chris George 
>> wrote:
>>
>> SparkSQL cannot support these type of statements but we may be able to
>> implement similar functionality through the api.
>> -Chris
>>
>> On 4/12/16, 5:19 PM, "Benjamin Kim"  wrote:
>>
>> It would be nice to adhere to the SQL:2003 standard for an “upsert” if it
>> were to be implemented.
>>
>> MERGE INTO table_name USING table_reference ON (condition)
>>  WHEN MATCHED THEN
>>  UPDATE SET column1 = value1 [, column2 = value2 ...]
>>  WHEN NOT MATCHED THEN
>>  INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 …])
>>
>> Cheers,
>> Ben
>>
>> On Apr 11, 2016, at 12:21 PM, Chris George 
>> wrote:
>>
>> I have a wip kuduRDD that I made a few months ago. I pushed it into
>> gerrit if you want to take a look.
>> http://gerrit.cloudera.org:8080/#/c/2754/
>> It does pushdown predicates which the existing input formatter based rdd
>> does not.
>>
>> Within the next two weeks I’m planning to implement a datasource for
>> spark that will have pushdown predicates and insertion/update functionality
>> (need to look more at cassandra and the hbase datasource for best way to do
>> this) I agree that server side upsert would be helpful.
>> Having a datasource would give us useful data frames and also make spark
>> sql usable for kudu.
>>
>> My reasoning for having a spark datasource and not using Impala is: 1. We
>> have had trouble getting impala to run fast with high concurrency when
>> compared to spark 2. We interact with datasources which do not integrate
>> with impala. 3. We have custom sql query planners for extended sql
>> functionality.
>>
>> -Chris George
>>
>>
>> On 4/11/16, 12:22 PM, "Jean-Daniel Cryans"  wrote:
>>
>> You guys make a convincing point, although on the upsert side we'll need
>> more support from the servers. Right now all you can do is an INSERT then,
>> if you get a dup key, do an UPDATE. I guess we could at least add an API on
>> the client side that would manage it, but it wouldn't be atomic.
>>
>> J-D
>>
>> On Mon, Apr 11, 2016 at 9:34 AM, Mark Hamstra 
>> wrote:
>>
>>> It's pretty simple, actually.  I need to support versioned datasets in a
>>> Spark SQL environment.  Instead of a hack on top of a Parquet data store,
>>> I'm hoping (among other reasons) to be able to use Kudu's write and
>>> timestamp-based read operations to support not only appending data, but
>>> also updating existing data, and even some schema migration.  The most
>>> typical use case is a dataset that is updated periodically (e.g., weekly or
>>> monthly) in which the the preliminary data in the previous window (week or
>>> month) is updated with values that are expected to remain unchanged from
>>> then on, and a new set of preliminary values for the current window need to
>>> be added/appended.
>>>
>>> Using Kudu's Java API and developing additional functionality on top of
>>> what Kudu has to offer isn't too much to ask, but the ease of integration
>>> with Spark SQL will gate how quickly we would move to using Kudu and how
>>> seriously we'd look at alternatives before making that decision.
>>>
>>> On Mon, Apr 11, 2016 at 8:14 AM, Jean-Daniel Cryans >> > wrote:
>>>
 Mark,

 Thanks for taking some time to reply in this thread, glad it caught the
 attention of other folks!

 On Sun, Apr 10, 2016 at 12:33 PM, Mark Hamstra  wrote:

> Do they care being able to insert into Kudu with SparkSQL
>
>
> I care