Re: [ANNOUNCE] Apache Bahir 2.0.0

2016-08-15 Thread Chris Mattmann
Great work Luciano!



On 8/15/16, 2:19 PM, "Luciano Resende"  wrote:

The Apache Bahir PMC is pleased to announce the release of Apache Bahir
2.0.0  which is our first major release and provides the following
extensions for Apache Spark 2.0.0 :

Akka Streaming
MQTT Streaming and Structured Streaming
Twitter Streaming
ZeroMQ Streaming

For more information about Apache Bahir and to download the release:

http://bahir.apache.org

Thanks,

The Apache Bahir PMC




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Have I done everything correctly when subscribing to Spark User List

2016-08-08 Thread Chris Mattmann
Weird!





On 8/8/16, 11:10 AM, "Sean Owen" <so...@cloudera.com> wrote:

>I also don't know what's going on with the "This post has NOT been
>accepted by the mailing list yet" message, because actually the
>messages always do post. In fact this has been sent to the list 4
>times:
>
>https://www.mail-archive.com/search?l=user%40spark.apache.org=dueckm=0=0
>
>On Mon, Aug 8, 2016 at 3:03 PM, Chris Mattmann <mattm...@apache.org> wrote:
>>
>>
>>
>>
>>
>> On 8/8/16, 2:03 AM, "matthias.du...@fiduciagad.de" 
>> <matthias.du...@fiduciagad.de> wrote:
>>
>>>Hello,
>>>
>>>I write to you because I am not really sure whether I did everything right 
>>>when registering and subscribing to the spark user list.
>>>
>>>I posted the appended question to Spark User list after subscribing and 
>>>receiving the "WELCOME to user@spark.apache.org" mail from 
>>>"user-h...@spark.apache.org".
>>> But this post is still in state "This post has NOT been accepted by the 
>>> mailing list yet.".
>>>
>>>Is this because I forgot something to do or did something wrong with my user 
>>>account (dueckm)? Or is it because no member of the Spark User List reacted 
>>>to that post yet?
>>>
>>>Thanks a lot for yout help.
>>>
>>>Matthias
>>>
>>>Fiducia & GAD IT AG | www.fiduciagad.de
>>>AG Frankfurt a. M. HRB 102381 | Sitz der Gesellschaft: Hahnstr. 48, 60528 
>>>Frankfurt a. M. | USt-IdNr. DE 143582320
>>>Vorstand: Klaus-Peter Bruns (Vorsitzender), Claus-Dieter Toben (stv. 
>>>Vorsitzender),
>>>
>>>Jens-Olaf Bartels, Martin Beyer, Jörg Dreinhöfer, Wolfgang Eckert, Carsten 
>>>Pfläging, Jörg Staff
>>>Vorsitzender des Aufsichtsrats: Jürgen Brinkmann
>>>
>>>- Weitergeleitet von Matthias Dück/M/FAG/FIDUCIA/DE am 08.08.2016 10:57 
>>>-
>>>
>>>Von: dueckm <matthias.du...@fiduciagad.de>
>>>An: user@spark.apache.org
>>>Datum: 04.08.2016 13:27
>>>Betreff: Are join/groupBy operations with wide Java Beans using Dataset API 
>>>much slower than using RDD API?
>>>
>>>
>>>
>>>
>>>
>>>Hello,
>>>
>>>I built a prototype that uses join and groupBy operations via Spark RDD API.
>>>Recently I migrated it to the Dataset API. Now it runs much slower than with
>>>the original RDD implementation.
>>>Did I do something wrong here? Or is this a price I have to pay for the more
>>>convienient API?
>>>Is there a known solution to deal with this effect (eg configuration via
>>>"spark.sql.shuffle.partitions" - but now could I determine the correct
>>>value)?
>>>In my prototype I use Java Beans with a lot of attributes. Does this slow
>>>down Spark-operations with Datasets?
>>>
>>>Here I have an simple example, that shows the difference:
>>>JoinGroupByTest.zip
>>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/JoinGroupByTest.zip>
>>>- I build 2 RDDs and join and group them. Afterwards I count and display the
>>>joined RDDs.  (Method de.testrddds.JoinGroupByTest.joinAndGroupViaRDD() )
>>>- When I do the same actions with Datasets it takes approximately 40 times
>>>as long (Methodd e.testrddds.JoinGroupByTest.joinAndGroupViaDatasets()).
>>>
>>>Thank you very much for your help.
>>>Matthias
>>>
>>>PS1: excuse me for sending this post more than once, but I am new to this
>>>mailing list and probably did something wrong when registering/subscribing,
>>>so my previous postings have not been accepted ...
>>>
>>>PS2: See the appended screenshots taken from Spark UI (jobs 0/1 belong to
>>>RDD implementation, jobs 2/3 to Dataset):
>>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/jobs.png>
>>>
>>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_RDD_Details.png>
>>>
>>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_Dataset_Details.png>
>>>
>>>
>>>
>>>
>>>--
>>>View this message in context: 
>>>http://apache-spark-user-list.1001560.n3.nabble.com/Are-join-groupBy-operations-with-wide-Java-Beans-using-Dataset-API-much-slower-than-using-RDD-API-tp27473.html
>>>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>-
>>>To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



FW: Have I done everything correctly when subscribing to Spark User List

2016-08-08 Thread Chris Mattmann





On 8/8/16, 2:03 AM, "matthias.du...@fiduciagad.de" 
 wrote:

>Hello,
>
>I write to you because I am not really sure whether I did everything right 
>when registering and subscribing to the spark user list.
>
>I posted the appended question to Spark User list after subscribing and 
>receiving the "WELCOME to user@spark.apache.org" mail from 
>"user-h...@spark.apache.org".
> But this post is still in state "This post has NOT been accepted by the 
> mailing list yet.".
>
>Is this because I forgot something to do or did something wrong with my user 
>account (dueckm)? Or is it because no member of the Spark User List reacted to 
>that post yet?
>
>Thanks a lot for yout help.
>
>Matthias
>
>Fiducia & GAD IT AG | www.fiduciagad.de
>AG Frankfurt a. M. HRB 102381 | Sitz der Gesellschaft: Hahnstr. 48, 60528 
>Frankfurt a. M. | USt-IdNr. DE 143582320
>Vorstand: Klaus-Peter Bruns (Vorsitzender), Claus-Dieter Toben (stv. 
>Vorsitzender),
>
>Jens-Olaf Bartels, Martin Beyer, Jörg Dreinhöfer, Wolfgang Eckert, Carsten 
>Pfläging, Jörg Staff
>Vorsitzender des Aufsichtsrats: Jürgen Brinkmann
>
>- Weitergeleitet von Matthias Dück/M/FAG/FIDUCIA/DE am 08.08.2016 10:57 
>-
>
>Von: dueckm 
>An: user@spark.apache.org
>Datum: 04.08.2016 13:27
>Betreff: Are join/groupBy operations with wide Java Beans using Dataset API 
>much slower than using RDD API?
>
>
>
>
>
>Hello,
>
>I built a prototype that uses join and groupBy operations via Spark RDD API.
>Recently I migrated it to the Dataset API. Now it runs much slower than with
>the original RDD implementation. 
>Did I do something wrong here? Or is this a price I have to pay for the more
>convienient API?
>Is there a known solution to deal with this effect (eg configuration via
>"spark.sql.shuffle.partitions" - but now could I determine the correct
>value)?
>In my prototype I use Java Beans with a lot of attributes. Does this slow
>down Spark-operations with Datasets?
>
>Here I have an simple example, that shows the difference: 
>JoinGroupByTest.zip
>
>  
>- I build 2 RDDs and join and group them. Afterwards I count and display the
>joined RDDs.  (Method de.testrddds.JoinGroupByTest.joinAndGroupViaRDD() )
>- When I do the same actions with Datasets it takes approximately 40 times
>as long (Methodd e.testrddds.JoinGroupByTest.joinAndGroupViaDatasets()).
>
>Thank you very much for your help.
>Matthias
>
>PS1: excuse me for sending this post more than once, but I am new to this
>mailing list and probably did something wrong when registering/subscribing,
>so my previous postings have not been accepted ...
>
>PS2: See the appended screenshots taken from Spark UI (jobs 0/1 belong to
>RDD implementation, jobs 2/3 to Dataset):
>
>
>
>
>
>
>
>
>
>--
>View this message in context: 
>http://apache-spark-user-list.1001560.n3.nabble.com/Are-join-groupBy-operations-with-wide-Java-Beans-using-Dataset-API-much-slower-than-using-RDD-API-tp27473.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>-
>To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org