Re: Performance tuning advice

Eugen Cepoi Sun, 05 Apr 2015 10:42:39 -0700

2015-04-04 20:01 GMT+02:00 Ryan Blue <[email protected]>:

> Did you also set the row group size? It looks like this row group is
> ~103MB, which doesn't make sense with your block size (unless I'm reading
> the output wrong). I'm not really sure how much block size would matter
> either. The row group will only get processed by a single task even if
> there are multiple "HDFS" blocks covering it.
>


I didn't knew we can configure the row group size, which option is it?
I only configured compression, block size, page size and dictionary page
size.


>
> How did you arrive at 16KB for page size?
>

Mainly by following the configuration guide lines here
http://parquet.incubator.apache.org/documentation/latest/.
The default is around 1Mb, but I don't have the impression that in my
situation changing all those made any big difference.

About the config that performed best I coalesced the output to a small
number of parts each around 32mb so it matches the actual block size
configured for parquet.


>
> rb
>
>
> On 04/03/2015 09:52 AM, Eugen Cepoi wrote:
>
>> Here is one of the results. It is for the execution with the config I was
>> expecting to perform the best based on my sampled data.
>>
>> Compression: LZO, Page size and dictionary size: 16KB, block size 32 MB,
>> there are 32 parts of a total 911M on S3 (so a single file is in fact less
>> than 32mb). I am not sure that the block size actually matters so much as
>> the data is on S3 and not hdfs... :(
>>
>> When I just get all the fields it is much worse than with raw thrift. If I
>> select one nested field (foo/** where foo has only 2 leafs) and a few
>> direct leafs then performance is similar to getting all without any
>> filter.
>> When selecting only ~5 leafs performance is similar to raw thrift.
>>
>> Thanks!
>>
>>
>> row group 1:        RC:283052 TS:107919094 OFFSET:4
>> ------------------------------------------------------------
>> --------------------
>> a:           INT64 LZO DO:0 FPO:4 SZ:365710/2213388/6,05 VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED
>> b:  INT64 LZO DO:0 FPO:365714 SZ:505835/2228766/4,41 VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED
>> c:                  BINARY LZO DO:0 FPO:871549 SZ:10376384/11393987/1,10
>> VC:283052 ENC:PLAIN,BIT_PACKED
>> d:           BINARY LZO DO:0 FPO:11247933 SZ:70986/78575/1,11 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED
>> e:     BINARY LZO DO:0 FPO:11318919 SZ:2159/2603/1,21 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> f:       BINARY LZO DO:0 FPO:11321078 SZ:41917/47856/1,14 VC:283052
>> ENC:PLAIN,BIT_PACKED,RLE
>> g:
>> .g1:              BINARY LZO DO:0 FPO:11362995 SZ:38549/37372/0,97
>> VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> .g2:
>> ..g21:         INT64 LZO DO:0 FPO:11401544 SZ:61882/388906/6,28 VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> ..g22:          BINARY LZO DO:0 FPO:11463426 SZ:1144390/7158351/6,26
>> VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> h:
>> .h1:              BINARY LZO DO:0 FPO:12607816 SZ:63896/68688/1,07
>> VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> .h2:
>> ..h21:         INT64 LZO DO:0 FPO:12671712 SZ:1169087/2207025/1,89
>> VC:283052 ENC:PLAIN,BIT_PACKED,RLE
>> ..h22:          BINARY LZO DO:0 FPO:13840799 SZ:29116/40513/1,39 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> i:
>> .i1:              BINARY LZO DO:0 FPO:13869915 SZ:10933/13648/1,25
>> VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> .i2:
>> ..i21:         INT64 LZO DO:0 FPO:13880848 SZ:11523/17795/1,54 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> ..i22:          BINARY LZO DO:0 FPO:13892371 SZ:135510/248827/1,84
>> VC:283052 ENC:PLAIN,BIT_PACKED,RLE
>> j:
>> .j1:              BINARY LZO DO:0 FPO:14027881 SZ:37025/35497/0,96
>> VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> .j2:
>> ..j21:         INT64 LZO DO:0 FPO:14064906 SZ:28196/37242/1,32 VC:283052
>> ENC:PLAIN,BIT_PACKED,RLE
>> ..j22:          BINARY LZO DO:0 FPO:14093102 SZ:945481/6491450/6,87
>> VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> k:             BINARY LZO DO:0 FPO:15038583 SZ:39147/36673/0,94 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED
>> l:          BINARY LZO DO:0 FPO:15077730 SZ:58233/60236/1,03 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> m:            BINARY LZO DO:0 FPO:15135963 SZ:28326/30663/1,08 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED
>> n:           BINARY LZO DO:0 FPO:15164289 SZ:2223225/26327896/11,84
>> VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> o:            BINARY LZO DO:0 FPO:17387514 SZ:690400/4470368/6,48
>> VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> p:          BINARY LZO DO:0 FPO:18077914 SZ:39/27/0,69 VC:283052
>> ENC:PLAIN,BIT_PACKED,RLE
>> q:           BINARY LZO DO:0 FPO:18077953 SZ:1099508/7582263/6,90
>> VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> r:           BINARY LZO DO:0 FPO:19177461 SZ:1372666/8752125/6,38
>> VC:283052
>> ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
>> s:              BINARY LZO DO:0 FPO:20550127 SZ:52878/51840/0,98 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> t:          BINARY LZO DO:0 FPO:20603005 SZ:51548/49339/0,96 VC:283052
>> ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
>> u:
>> .map:
>> ..key:               BINARY LZO DO:0 FPO:20654553 SZ:75794/85569/1,13
>> VC:291795 ENC:PLAIN_DICTIONARY,RLE
>> ..value:             BINARY LZO DO:0 FPO:20730347 SZ:58334/62448/1,07
>> VC:291795 ENC:PLAIN_DICTIONARY,RLE
>> v:
>> .map:
>> ..key:               BINARY LZO DO:0 FPO:20788681 SZ:1072311/2977966/2,78
>> VC:2674014 ENC:PLAIN_DICTIONARY,RLE
>> ..value:             BINARY LZO DO:0 FPO:21860992 SZ:6997331/24721192/3,53
>> VC:2674014 ENC:PLAIN_DICTIONARY,PLAIN,RLE
>>
>>
>> 2015-04-03 18:22 GMT+02:00 Eugen Cepoi <[email protected]>:
>>
>>  Hey Ryan,
>>>
>>> 2015-04-03 18:00 GMT+02:00 Ryan Blue <[email protected]>:
>>>
>>>  On 04/02/2015 07:38 AM, Eugen Cepoi wrote:
>>>>
>>>>  Hi there,
>>>>>
>>>>> I was testing parquet with thrift to see if there would be an
>>>>> interesting performance gain compared to using just thrift. But in my
>>>>> test I found that just using plain thrift with lzo compression was
>>>>> faster.
>>>>>
>>>>>
>>>> This doesn't surprise me too much because of how the Thrift object model
>>>> works. (At least, assuming I understand it right. Feel free to correct
>>>> me.)
>>>>
>>>> Thrift wants to read and write using the TProtocol, which provides a
>>>> layer like Parquet's Converters that is an intermediary between the
>>>> object
>>>> model and underlying encodings. Parquet implements TProtocol by
>>>> building a
>>>> list of the method calls a record will make to read or write itself,
>>>> then
>>>> allowing the record to read that list. I think this has the potential to
>>>> slow down reading and writing.
>>>>
>>>>
>>>  It's on my todo list to try to get this working using avro-thrift, which
>>>> sets the fields directly.
>>>>
>>>
>>>
>>>
>>> Yes I find logic the double "ser/de" overhead, but was not expecting such
>>> a big difference.
>>> I didn't read the code doing the conversion, but with thrift we can
>>> directly set the fields, at least if what you mean is setting without
>>> reflection.
>>> So basically one can just create an "empty" instance via the default ctr
>>> and reflection and then use setFieldValue method with the corresponding
>>> _Field (an enum) and value. We can even reuse those instances.
>>> I think this would perform better than using avro-thrift that adds
>>> another
>>> layer. If you can point me to the code of interest I can maybe be of some
>>> help :)
>>>
>>> Does the impl based on avro perform much better?
>>>
>>>
>>>
>>>  That's just to see if it might be faster constructing the records
>>>> directly, since we rely on TProtocol to make both thrift and scrooge
>>>> objects work.
>>>>
>>>>   I used a small EMR cluster with 2 m3.xlarge cores.
>>>>
>>>>> The sampled input has 9 million records about 1g (on S3) with ~20
>>>>> fields
>>>>> and some nested structures and maps. I just do a count on it.
>>>>> I tried playing with different tuning options but none seemed to really
>>>>> improve things (the pic shows some global metrics for the different
>>>>> options).
>>>>>
>>>>> I also tried with a larger sample about a couple of gigs (output once
>>>>> compressed), but I had similar results.
>>>>>
>>>>>
>>>> Could you post the results of `parquet-tools meta`? I'd like to see what
>>>> your column layout looks like (the final column chunk sizes).
>>>>
>>>> If your data ends up with only a column or two dominating the row group
>>>> and you always select those columns, then you probably wouldn't see an
>>>> improvement. You need at least one "big" column chunk that you're
>>>> ignoring.
>>>>
>>>>
>>>>  I'll provide those shortly. BTW I had some warnings indicating that it
>>> couldn't skip row groups due to predicates or something like this. I'll
>>> try
>>> to provide it too.
>>>
>>>
>>>  Also, what compression did you use for the Parquet files?
>>>>
>>>>
>>> Lzo, it is also the one I am using for the raw thrift data.
>>>
>>> Thank you!
>>> Eugen
>>>
>>>
>>>
>>>
>>>>   In the end the only situation I can see where it can perform
>>>>
>>>>> significantly better is when reading few columns from a dataset that
>>>>> has
>>>>> a large number of columns. But as the schemas are hand written I don't
>>>>> imagine having data structures with hundreds of columns.
>>>>>
>>>>>
>>>> I think we'll know more from taking a look at the row groups and column
>>>> chunk sizes.
>>>>
>>>>
>>>>   I am wondering if I am doing something wrong (esp. due to the large
>>>>
>>>>> difference between plain thrift and parquet+thrift) or if the used
>>>>> dataset isn't a good fit for parquet?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Eugen
>>>>>
>>>>>
>>>> rb
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Cloudera, Inc.
>>>>
>>>>
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: Performance tuning advice

Reply via email to