kishred opened a new issue #5388:
URL: https://github.com/apache/incubator-pinot/issues/5388


   Batch ingestion process takes 26 minutes to ingest a parquet file with 570MB 
in size. The data contains 3.5 million rows with 16 columns (see table below 
for details). It does not look like the performance is specific to any input 
file formats as ingestion took similar times for the same data in CSV and AVRO 
formats.
   
   Column | Type |Structure | Size | Cardinality| Max Length|
   ------------ | -------------|-----|-------|-----------|---------|
   1 | STRING|fixed bytes value dictionary|5|1|15
   2 | STRING|fixed bytes value dictionary| 222825 |8913|25
   3 | STRING|fixed bytes value dictionary| 571140 | 38076 |15
   4 | STRING|fixed bytes value dictionary| 28| 4 |7
   5 | STRING|fixed bytes value dictionary| 910| 35 |26
   6 | STRING|fixed bytes value dictionary| 1020|85 |12
   7 | STRING|fixed bytes value dictionary| 110| 11 |10
   8 | STRING|fixed bytes value dictionary| 40| 5 |8
   9 | STRING|fixed bytes value dictionary| 25| 5 |5
   10 | STRING|fixed bytes value dictionary| 4553| 157 |29
   11 | LONG|dictionary| | 35943781 ||
   12 | INT|dictionary| | 1 ||
   13 | INT|dictionary| | 85 ||
   14 | INT|dictionary| | 63971 ||
   15 | INT|dictionary| | 3 ||
   16 | INT|dictionary| | 64784 ||
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to