Streaming : WAL ignored

2018-05-24 Thread Walid Lezzar
Hi, I have a spark streaming application running on yarn that consumes from a jms source. I have the checkpointing and WAL enabled to ensure zero data loss. However, When I suddenly kill my application and restarts it, sometimes it recovers the data from the WAL but sometimes it doesn’t !! In

Re: How to read the schema of a partitioned dataframe without listing all the partitions ?

2018-04-27 Thread Walid Lezzar
park.apache.org > Global Temporary View. Temporary views in Spark SQL are session-scoped and > will disappear if the session that creates it terminates. If you want to have > a temporary view that is shared among all sessions and keep alive until the > Spark application terminates,

How to read the schema of a partitioned dataframe without listing all the partitions ?

2018-04-27 Thread Walid LEZZAR
Hi, I have a parquet on S3 partitioned by day. I have 2 years of data (-> about 1000 partitions). With spark, when I just want to know the schema of this parquet without even asking for a single row of data, spark tries to list all the partitions and the nested partitions of the parquet. Which

Re: Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-02 Thread Walid Lezzar
t; > Thanks > Saisai > >> On Sat, Apr 2, 2016 at 6:48 AM, Walid Lezzar <walez...@gmail.com> wrote: >> Hi, >> >> I looked into the spark code at how spark report metrics using the >> MetricsSystem class. I've seen that the spark MetricsSystem class when >>

Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-01 Thread Walid Lezzar
Hi, I looked into the spark code at how spark report metrics using the MetricsSystem class. I've seen that the spark MetricsSystem class when instantiated parses the metrics.properties file, tries to find the sinks class name and load them dinamically. It would be great to implement my own

[Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Walid LEZZAR
Hi, We have been using spark streaming for a little while now. Until now, we were running our spark streaming jobs in spark 1.5.1 and it was working well. Yesterday, we upgraded to spark 1.6.0 without any changes in the code. But our streaming jobs are not working any more. We are getting an