I also don't know what's going on with the "This post has NOT been
accepted by the mailing list yet" message, because actually the
messages always do post. In fact this has been sent to the list 4
times:

https://www.mail-archive.com/search?l=user%40spark.apache.org&q=dueckm&submit.x=0&submit.y=0

On Mon, Aug 8, 2016 at 3:03 PM, Chris Mattmann <mattm...@apache.org> wrote:
>
>
>
>
>
> On 8/8/16, 2:03 AM, "matthias.du...@fiduciagad.de" 
> <matthias.du...@fiduciagad.de> wrote:
>
>>Hello,
>>
>>I write to you because I am not really sure whether I did everything right 
>>when registering and subscribing to the spark user list.
>>
>>I posted the appended question to Spark User list after subscribing and 
>>receiving the "WELCOME to user@spark.apache.org" mail from 
>>"user-h...@spark.apache.org".
>> But this post is still in state "This post has NOT been accepted by the 
>> mailing list yet.".
>>
>>Is this because I forgot something to do or did something wrong with my user 
>>account (dueckm)? Or is it because no member of the Spark User List reacted 
>>to that post yet?
>>
>>Thanks a lot for yout help.
>>
>>Matthias
>>
>>Fiducia & GAD IT AG | www.fiduciagad.de
>>AG Frankfurt a. M. HRB 102381 | Sitz der Gesellschaft: Hahnstr. 48, 60528 
>>Frankfurt a. M. | USt-IdNr. DE 143582320
>>Vorstand: Klaus-Peter Bruns (Vorsitzender), Claus-Dieter Toben (stv. 
>>Vorsitzender),
>>
>>Jens-Olaf Bartels, Martin Beyer, Jörg Dreinhöfer, Wolfgang Eckert, Carsten 
>>Pfläging, Jörg Staff
>>Vorsitzender des Aufsichtsrats: Jürgen Brinkmann
>>
>>----- Weitergeleitet von Matthias Dück/M/FAG/FIDUCIA/DE am 08.08.2016 10:57 
>>-----
>>
>>Von: dueckm <matthias.du...@fiduciagad.de>
>>An: user@spark.apache.org
>>Datum: 04.08.2016 13:27
>>Betreff: Are join/groupBy operations with wide Java Beans using Dataset API 
>>much slower than using RDD API?
>>
>>________________________________________
>>
>>
>>
>>Hello,
>>
>>I built a prototype that uses join and groupBy operations via Spark RDD API.
>>Recently I migrated it to the Dataset API. Now it runs much slower than with
>>the original RDD implementation.
>>Did I do something wrong here? Or is this a price I have to pay for the more
>>convienient API?
>>Is there a known solution to deal with this effect (eg configuration via
>>"spark.sql.shuffle.partitions" - but now could I determine the correct
>>value)?
>>In my prototype I use Java Beans with a lot of attributes. Does this slow
>>down Spark-operations with Datasets?
>>
>>Here I have an simple example, that shows the difference:
>>JoinGroupByTest.zip
>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/JoinGroupByTest.zip>
>>- I build 2 RDDs and join and group them. Afterwards I count and display the
>>joined RDDs.  (Method de.testrddds.JoinGroupByTest.joinAndGroupViaRDD() )
>>- When I do the same actions with Datasets it takes approximately 40 times
>>as long (Methodd e.testrddds.JoinGroupByTest.joinAndGroupViaDatasets()).
>>
>>Thank you very much for your help.
>>Matthias
>>
>>PS1: excuse me for sending this post more than once, but I am new to this
>>mailing list and probably did something wrong when registering/subscribing,
>>so my previous postings have not been accepted ...
>>
>>PS2: See the appended screenshots taken from Spark UI (jobs 0/1 belong to
>>RDD implementation, jobs 2/3 to Dataset):
>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/jobs.png>
>>
>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_RDD_Details.png>
>>
>><http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_Dataset_Details.png>
>>
>>
>>
>>
>>--
>>View this message in context: 
>>http://apache-spark-user-list.1001560.n3.nabble.com/Are-join-groupBy-operations-with-wide-Java-Beans-using-Dataset-API-much-slower-than-using-RDD-API-tp27473.html
>>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>>---------------------------------------------------------------------
>>To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to