Re: Spark streaming - TIBCO EMS

2017-05-15 Thread Piotr Smoliński
Hi Pradeep, You need to connect via regular JMS API. Obtain factory from JNDI or create it directly using com.tibco.tibjms.TibjmsConnectionFactory. On the classpath you need JMS 2.0 API (jms-2.0.jar) and EMS driver classes (tibjms.jar). Regards, Piotr On Mon, May 15, 2017 at 5:47 PM, Pradeep

Re: Can spark support exactly once based kafka ? Due to these following question?

2016-12-05 Thread Piotr Smoliński
The boundary is a bit flexible. In terms of observed DStream effective state the direct stream semantics is exactly-once. In terms of external system observations (like message emission), Spark Streaming semantics is at-least-once. Regards, Piotr On Mon, Dec 5, 2016 at 8:59 AM, Michal Šenkýř

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-26 Thread Piotr Smoliński
In YARN you submit the whole application. This way unless the distribution provider does strange classpath "optimisations" you may just submit Spark 2 application aside of Spark 1.5 or 1.6. It is YARN responsibility to deliver the application files and spark assembly to the workers. What's more,

Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-26 Thread Piotr Smoliński
at's what happened. In fact, there are about 100 files > on each worker node in a directory corresponding to the write. > > Any way to tone that down a bit (maybe 1 file per worker)? Or, write a > single file somewhere? > > > On Mon, Sep 26, 2016 at 12:44 AM, Piotr Smoliński &l

Re: Writing Dataframe to CSV yields blank file called "_SUCCESS"

2016-09-25 Thread Piotr Smoliński
Hi Peter, The blank file _SUCCESS indicates properly finished output operation. What is the topology of your application? I presume, you write to local filesystem and have more than one worker machine. In such case Spark will write the result files for each partition (in the worker which holds

Re: Cached Parquet file paths problem

2016-04-19 Thread Piotr Smoliński
Solved it. The anonymous RDDs can be cached in the cacheManager in SQLContext. In order to remove all the cached content use: sqlContext.clearCache() The warning symptom about failed data frame registration is the following entry in the log: 16/04/16 20:18:39 [tp439928219-110] WARN

Cached Parquet file paths problem

2016-03-22 Thread Piotr Smoliński
Hi, After migration from Spark 1.5.2 to 1.6.1 I faced strange issue. I have a Parquet directory with partitions. Each partition (month) is a subject of incremental ETL that takes current Avro files and replaces the corresponding Parquet files. Now there is a problem that appeared in 1.6.x: I