Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-21 Thread Jean-Daniel Cryans
(with my vendor hat on)

>From a Cloudera perspective, support for Kudu is still in beta. We offer
the bits with no guarantees. If you have more questions regarding parcels,
CM, etc, please direct them to
http://community.cloudera.com/t5/Beta-Releases-Apache-Kudu/bd-p/Beta

Thanks!

J-D

On Wed, Sep 21, 2016 at 11:11 AM, Benjamin Kim  wrote:

> I tried installing using Cloudera Manager and noticed that the
> documentation doesn’t state the URL to enter in the Parcel Settings. So, I
> just re-used the old one for the beta, but there is an annoying reminder
> that Kudu is still beta. Is there a new parcel URL that is not for the beta?
>
> Thanks,
> Ben
>
>
> On Sep 20, 2016, at 11:23 PM, Matteo Durighetto 
> wrote:
>
>
> 2016-09-20 9:11 GMT+02:00 Todd Lipcon :
>
>> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
>>
>> Kudu is an open source storage engine for structured data which supports
>> low-latency random access together with efficient analytical access
>> patterns. It is designed within the context of the Apache Hadoop ecosystem
>> and supports many integrations with other data analytics projects both
>> inside and outside of the Apache Software Foundation.
>>
>> This latest version adds several new features, including:
>>
>> - Removal of multiversion concurrency control (MVCC) history is now
>> supported. This allows Kudu to reclaim disk space, where previously Kudu
>> would keep a full history of all changes made to a given table since the
>> beginning of time.
>>
>> - Most of Kudu’s command line tools have been consolidated under a new
>> top-level "kudu" tool. This reduces the number of large binaries
>> distributed with Kudu and also includes much-improved help output.
>>
>> - Administrative tools including "kudu cluster ksck" now support running
>> against multi-master Kudu clusters.
>>
>> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND
>> mode. This can provide higher throughput for ingest workloads.
>>
>> This release also includes many bug fixes, optimizations, and other
>> improvements, detailed in the release notes available at:
>> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html
>>
>> Download the source release here:
>> http://kudu.apache.org/releases/1.0.0/
>>
>> Convenience binary artifacts for the Java client and various Java
>> integrations (eg Spark, Flume) are also now available via the ASF Maven
>> repository.
>>
>> Enjoy the new release!
>>
>> - The Apache Kudu team
>>
>
>
> Really great. Moreover there are a new producer in flume-kudu sink:
> The regexp kudu producer
>
> https://github.com/cloudera/kudu/blob/master/java/kudu-
> flume-sink/src/main/java/org/apache/kudu/flume/sink/
> RegexpKuduOperationsProducer.java
>
> With the regexp kudu producer is simple to cast with a reg exp and write
> records into kudu tables:
>
>  * A regular expression serializer that generates one {@link Insert} or
>  * {@link Upsert} per {@link Event} by parsing the payload into values
> using a
>  * regular expression. Values are coerced to the proper column types.
>  *
>  * Example: if the Kudu table has the schema
>  *
>  * key INT32
>  * name STRING
>  *
>  * and producer.pattern is '(?\\d+),(?\w+)', then the
>  * RegexpKuduOperationsProducer will parse the string
>  *
>  * |12345,Mike||54321,Todd|
>  *
>  * into the rows (key=12345, name=Mike) and (key=54321, name=Todd).
>
> We are just testing it, and it's working.
>
> Kind Regards
>
> Matteo Durighetto
> e-mail: m.durighe...@miriade.it
>
>
>
>


Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Benjamin Kim
Todd,

Thanks. I’ll look into those.

Cheers,
Ben


> On Sep 20, 2016, at 12:11 AM, Todd Lipcon  wrote:
> 
> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
> 
> Kudu is an open source storage engine for structured data which supports 
> low-latency random access together with efficient analytical access patterns. 
> It is designed within the context of the Apache Hadoop ecosystem and supports 
> many integrations with other data analytics projects both inside and outside 
> of the Apache Software Foundation.
> 
> This latest version adds several new features, including:
> 
> - Removal of multiversion concurrency control (MVCC) history is now 
> supported. This allows Kudu to reclaim disk space, where previously Kudu 
> would keep a full history of all changes made to a given table since the 
> beginning of time.
> 
> - Most of Kudu’s command line tools have been consolidated under a new 
> top-level "kudu" tool. This reduces the number of large binaries distributed 
> with Kudu and also includes much-improved help output.
> 
> - Administrative tools including "kudu cluster ksck" now support running 
> against multi-master Kudu clusters.
> 
> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND mode. 
> This can provide higher throughput for ingest workloads.
> 
> This release also includes many bug fixes, optimizations, and other 
> improvements, detailed in the release notes available at:
> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html 
> 
> 
> Download the source release here:
> http://kudu.apache.org/releases/1.0.0/ 
> 
> 
> Convenience binary artifacts for the Java client and various Java 
> integrations (eg Spark, Flume) are also now available via the ASF Maven 
> repository.
> 
> Enjoy the new release!
> 
> - The Apache Kudu team



Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Todd Lipcon
-announce


On Tue, Sep 20, 2016 at 11:34 AM, Benjamin Kim  wrote:

> This is awesome!!! Great!!!
>
> Do you know if any improvements were also made to the Spark plugin jar?
>

Looks like a few changes based on the git log:
https://gist.github.com/4fa3ccc3b9be787227fed89c1bd42837

as well as a number of changes to the Java client (which gets pulled into
the Spark jar):
https://gist.github.com/e2a8ca78e51773fabb70aae34207199f


In particular, I think the partition pruning work in the Java client should
reduce the number of Spark partitions if you have predicates on your data
frames. (though I haven't personally verified it)

-Todd



> On Sep 20, 2016, at 12:11 AM, Todd Lipcon  wrote:
>
> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. It is designed within the context of the Apache Hadoop ecosystem
> and supports many integrations with other data analytics projects both
> inside and outside of the Apache Software Foundation.
>
> This latest version adds several new features, including:
>
> - Removal of multiversion concurrency control (MVCC) history is now
> supported. This allows Kudu to reclaim disk space, where previously Kudu
> would keep a full history of all changes made to a given table since the
> beginning of time.
>
> - Most of Kudu’s command line tools have been consolidated under a new
> top-level "kudu" tool. This reduces the number of large binaries
> distributed with Kudu and also includes much-improved help output.
>
> - Administrative tools including "kudu cluster ksck" now support running
> against multi-master Kudu clusters.
>
> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND
> mode. This can provide higher throughput for ingest workloads.
>
> This release also includes many bug fixes, optimizations, and other
> improvements, detailed in the release notes available at:
> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html
>
> Download the source release here:
> http://kudu.apache.org/releases/1.0.0/
>
> Convenience binary artifacts for the Java client and various Java
> integrations (eg Spark, Flume) are also now available via the ASF Maven
> repository.
>
> Enjoy the new release!
>
> - The Apache Kudu team
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Benjamin Kim
This is awesome!!! Great!!!

Do you know if any improvements were also made to the Spark plugin jar?

Thanks,
Ben

> On Sep 20, 2016, at 12:11 AM, Todd Lipcon  wrote:
> 
> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
> 
> Kudu is an open source storage engine for structured data which supports 
> low-latency random access together with efficient analytical access patterns. 
> It is designed within the context of the Apache Hadoop ecosystem and supports 
> many integrations with other data analytics projects both inside and outside 
> of the Apache Software Foundation.
> 
> This latest version adds several new features, including:
> 
> - Removal of multiversion concurrency control (MVCC) history is now 
> supported. This allows Kudu to reclaim disk space, where previously Kudu 
> would keep a full history of all changes made to a given table since the 
> beginning of time.
> 
> - Most of Kudu’s command line tools have been consolidated under a new 
> top-level "kudu" tool. This reduces the number of large binaries distributed 
> with Kudu and also includes much-improved help output.
> 
> - Administrative tools including "kudu cluster ksck" now support running 
> against multi-master Kudu clusters.
> 
> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND mode. 
> This can provide higher throughput for ingest workloads.
> 
> This release also includes many bug fixes, optimizations, and other 
> improvements, detailed in the release notes available at:
> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html 
> 
> 
> Download the source release here:
> http://kudu.apache.org/releases/1.0.0/ 
> 
> 
> Convenience binary artifacts for the Java client and various Java 
> integrations (eg Spark, Flume) are also now available via the ASF Maven 
> repository.
> 
> Enjoy the new release!
> 
> - The Apache Kudu team



Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Aminul Islam
Congrats
On Sep 20, 2016 9:35 PM, "Jacques Nadeau"  wrote:

> Congrats to everyone. This is a great accomplishment!
>
> On Tue, Sep 20, 2016 at 12:11 AM, Todd Lipcon  wrote:
>
>> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
>>
>> Kudu is an open source storage engine for structured data which supports
>> low-latency random access together with efficient analytical access
>> patterns. It is designed within the context of the Apache Hadoop ecosystem
>> and supports many integrations with other data analytics projects both
>> inside and outside of the Apache Software Foundation.
>>
>> This latest version adds several new features, including:
>>
>> - Removal of multiversion concurrency control (MVCC) history is now
>> supported. This allows Kudu to reclaim disk space, where previously Kudu
>> would keep a full history of all changes made to a given table since the
>> beginning of time.
>>
>> - Most of Kudu’s command line tools have been consolidated under a new
>> top-level "kudu" tool. This reduces the number of large binaries
>> distributed with Kudu and also includes much-improved help output.
>>
>> - Administrative tools including "kudu cluster ksck" now support running
>> against multi-master Kudu clusters.
>>
>> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND
>> mode. This can provide higher throughput for ingest workloads.
>>
>> This release also includes many bug fixes, optimizations, and other
>> improvements, detailed in the release notes available at:
>> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html
>>
>> Download the source release here:
>> http://kudu.apache.org/releases/1.0.0/
>>
>> Convenience binary artifacts for the Java client and various Java
>> integrations (eg Spark, Flume) are also now available via the ASF Maven
>> repository.
>>
>> Enjoy the new release!
>>
>> - The Apache Kudu team
>>
>
>


Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Jacques Nadeau
Congrats to everyone. This is a great accomplishment!

On Tue, Sep 20, 2016 at 12:11 AM, Todd Lipcon  wrote:

> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. It is designed within the context of the Apache Hadoop ecosystem
> and supports many integrations with other data analytics projects both
> inside and outside of the Apache Software Foundation.
>
> This latest version adds several new features, including:
>
> - Removal of multiversion concurrency control (MVCC) history is now
> supported. This allows Kudu to reclaim disk space, where previously Kudu
> would keep a full history of all changes made to a given table since the
> beginning of time.
>
> - Most of Kudu’s command line tools have been consolidated under a new
> top-level "kudu" tool. This reduces the number of large binaries
> distributed with Kudu and also includes much-improved help output.
>
> - Administrative tools including "kudu cluster ksck" now support running
> against multi-master Kudu clusters.
>
> - The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND
> mode. This can provide higher throughput for ingest workloads.
>
> This release also includes many bug fixes, optimizations, and other
> improvements, detailed in the release notes available at:
> http://kudu.apache.org/releases/1.0.0/docs/release_notes.html
>
> Download the source release here:
> http://kudu.apache.org/releases/1.0.0/
>
> Convenience binary artifacts for the Java client and various Java
> integrations (eg Spark, Flume) are also now available via the ASF Maven
> repository.
>
> Enjoy the new release!
>
> - The Apache Kudu team
>


[ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Todd Lipcon
The Apache Kudu team is happy to announce the release of Kudu 1.0.0!

Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports many integrations with other data analytics projects both
inside and outside of the Apache Software Foundation.

This latest version adds several new features, including:

- Removal of multiversion concurrency control (MVCC) history is now
supported. This allows Kudu to reclaim disk space, where previously Kudu
would keep a full history of all changes made to a given table since the
beginning of time.

- Most of Kudu’s command line tools have been consolidated under a new
top-level "kudu" tool. This reduces the number of large binaries
distributed with Kudu and also includes much-improved help output.

- Administrative tools including "kudu cluster ksck" now support running
against multi-master Kudu clusters.

- The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND
mode. This can provide higher throughput for ingest workloads.

This release also includes many bug fixes, optimizations, and other
improvements, detailed in the release notes available at:
http://kudu.apache.org/releases/1.0.0/docs/release_notes.html

Download the source release here:
http://kudu.apache.org/releases/1.0.0/

Convenience binary artifacts for the Java client and various Java
integrations (eg Spark, Flume) are also now available via the ASF Maven
repository.

Enjoy the new release!

- The Apache Kudu team