Re: Spark app write too many small parquet files

2016-12-08 Thread Kevin Tran
artitions > > This control the number of files generated. > > On 28 Nov 2016 8:29 p.m., "Kevin Tran" <kevin...@gmail.com> wrote: > >> Hi Denny, >> Thank you for your inputs. I also use 128 MB but still too many files >> generated by Spark app which is only ~14

Re: Spark app write too many small parquet files

2016-11-28 Thread Kevin Tran
Ha's presentation > Data > Storage Tips for Optimal Spark Performance > <https://spark-summit.org/2015/events/data-storage-tips-for-optimal-spark-performance/>. > > > On Sun, Nov 27, 2016 at 9:44 PM Kevin Tran <kevin...@gmail.com> wrote: > >> Hi Everyone,

Spark app write too many small parquet files

2016-11-27 Thread Kevin Tran
Hi Everyone, Does anyone know what is the best practise of writing parquet file from Spark ? As Spark app write data to parquet and it shows that under that directory there are heaps of very small parquet file (such as e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only 15KB

Extract timestamp from Kafka message

2016-09-25 Thread Kevin Tran
Hi Everyone, Does anyone know how could we extract timestamp from Kafka message in Spark streaming ? JavaPairInputDStream messagesDStream = KafkaUtils.createDirectStream( ssc, String.class, String.class, StringDecoder.class, StringDecoder.class,

Add sqldriver.jar to Spark 1.6.0 executors

2016-09-14 Thread Kevin Tran
Hi Everyone, I tried in cluster mode on YARN * spark-submit --jars /path/sqldriver.jar * --driver-class-path * spark-env.sh SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/path/*" * spark-defaults.conf spark.driver.extraClassPath spark.executor.extraClassPath None of them works for me ! Does

Re: call() function being called 3 times

2016-09-07 Thread Kevin Tran
] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 12.0 (TID 12). 2518 bytes result sent to driver Does anyone have any ideas? On Wed, Sep 7, 2016 at 7:30 PM, Kevin Tran <kevin...@gmail.com> wrote: > Hi Everyone, > Does anyone know why call() function bei

call() function being called 3 times

2016-09-07 Thread Kevin Tran
Hi Everyone, Does anyone know why call() function being called *3 times* for each message arrive JavaDStream message = messagesDStream.map(new >> Function, String>() { > > @Override > > public String call(Tuple2 tuple2) { > > return tuple2._2(); > > } > >

Re: Best ID Generator for ID field in parquet ?

2016-09-04 Thread Kevin Tran
rdpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable

Best ID Generator for ID field in parquet ?

2016-09-04 Thread Kevin Tran
Hi everyone, Please give me your opinions on what is the best ID Generator for ID field in parquet ? UUID.randomUUID(); AtomicReference currentTime = new AtomicReference<>(System.currentTimeMillis()); AtomicLong counter = new AtomicLong(0); Thanks, Kevin.

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
ny and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destru

Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
Hi, Does anyone know what is the best practises to store data to parquet file? Does parquet file has limit in size ( 1TB ) ? Should we use SaveMode.APPEND for long running streaming app ? How should we store in HDFS (directory structure, ... )? Thanks, Kevin.

Spark StringType could hold how many characters ?

2016-08-28 Thread Kevin Tran
Hi, I wrote to parquet file as following: ++ |word| ++ |THIS IS MY CHARACTERS ...| |// ANOTHER LINE OF CHAC...| ++ These lines are not full text and it is being trimmed down. Does anyone know how many chacters StringType

Write parquet file from Spark Streaming

2016-08-27 Thread Kevin Tran
Hi Everyone, Does anyone know how to write parquet file after parsing data in Spark Streaming? Thanks, Kevin.