Re: AVRO vs Parquet

2016-03-04 Thread Paul Leclercq
gt;>> tools like Hive, Impala, HAWQ. >>>> >>>> Suggestions? >>>> — >>>> airis.DATA >>>> Timothy Spann, Senior Solutions Architect >>>> C: 609-250-5894 >>>> http://airisdata.com/ >>>> http://meetup.com/nj-datascience >>>> >>>> >>>> >>> >>> >>> -- >>> Donald Drake >>> Drake Consulting >>> http://www.drakeconsulting.com/ >>> https://twitter.com/dondrake <http://www.MailLaunder.com/> >>> 800-733-2143 >>> >> >> > -- Paul Leclercq | Data engineer paul.lecle...@tabmo.io | http://www.tabmo.fr/

Re: Kafka streaming receiver approach - new topic not read from beginning

2016-02-23 Thread Paul Leclercq
}/{partitionId} {newOffset} Source : https://metabroadcast.com/blog/resetting-kafka-offsets 2016-02-22 11:55 GMT+01:00 Paul Leclercq <paul.lecle...@tabmo.io>: > Thanks for your quick answer. > > If I set "auto.offset.reset" to "smallest" as for KafkaParams like th

Re: Kafka streaming receiver approach - new topic not read from beginning

2016-02-22 Thread Paul Leclercq
guration "auto.offset.reset" through parameter > "kafkaParams" which is provided in some other overloaded APIs of > createStream. > > By default Kafka will pick data from latest offset unless you explicitly > set it, this is the behavior Kafka, not Spark. > >

Kafka streaming receiver approach - new topic not read from beginning

2016-02-22 Thread Paul Leclercq
offset.reset > to "earliest" for the new consumer in 0.9 and "smallest" for the old > consumer. https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whydoesmyconsumernevergetanydata? Thanks -- Paul Leclercq

Re: spark-1.2.0--standalone-ha-zookeeper

2016-01-20 Thread Paul Leclercq
Hi Raghvendra and Spark users, I also have trouble activating my stand by master when my first master is shutdown (via a ./sbin/stop-master.sh or via a instance shut down) and just want to share with you my thoughts. To answer your question Raghvendra, in *spark-env.sh*, if 2 IPs are set for

Re: Spark streaming job hangs

2015-12-01 Thread Paul Leclercq
- Added > jobs for time 144894989 ms > 2015-12-01 06:04:55,064 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 1448949895000 ms > 2015-12-01 06:05:00,125 [JobGenerator] INFO (Logging.scala:59) - Added > jobs for time 144894990 ms > > > Thanks > LCassa > -- Paul Leclercq | Data engineer paul.lecle...@tabmo.io | http://www.tabmo.fr/