Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread Tathagata Das
Note that this is not public API yet. Hence this is not very documented. So use it at your own risk :) On Tue, Jul 10, 2018 at 11:04 AM, subramgr wrote: > Hi, > > This looks very daunting *trait* is there some blog post or some articles > which explains on how to implement this *trait* > >

[Structured Streaming] Fine tuning GC performance

2018-07-10 Thread subramgr
Hi, Are there any specific methods to fine tune our Structured Streaming job ? Or is it similar to once mentioned here for RDDs https://spark.apache.org/docs/latest/tuning.html -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

2018-07-10 Thread Arun Hive
I am reading data from Kafka topics using create stream and pushing it to hive by using dataframes. The job seems to run fine for the 5-6 hours and then it fails with the above exception.  On Wednesday, May 30, 2018, 3:31:10 PM PDT, naresh Goud wrote: What are you doing? Give more

Convert scientific notation DecimalType

2018-07-10 Thread dimitris plakas
Hello everyone, I am new in Pyspark and i am facing a problem in casting some values in DecimalType. To clarify my question i present an example. i have a dataframe in which i store my data which are some trajectories the dataframe looks like *Id | Trajectory* id1 | [ [x1, y1, t1], [x2, y2,

[Spark MLib]: RDD caching behavior of KMeans

2018-07-10 Thread mkhan37
Hi All, I was varying the storage levels of RDD caching in the KMeans program implemented using the MLib library and got some very confusing and interesting results. The base code of the application is from a Benchmark suite named SparkBench . I changed

Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread subramgr
Hi, This looks very daunting *trait* is there some blog post or some articles which explains on how to implement this *trait* Thanks Girish -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

How Kryo serializer allocates buffer in Spark

2018-07-10 Thread nirav
I am getting following error in spark task. Default max value is 64mb! Document says it should be large enough to store largest object in my application. I don't think I have any object thhhat is bigger then 64mb. SO what these values (spark.kryoserializer.buffer, spark.kryoserializer.buffer.max)

Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread Stefan Van Wouw
Hi Girish, You can implement a custom state store provider by implementing the StateStore trait ( https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala ) and setting the correct Spark configuration accordingly:

Re: Dynamic allocation not releasing executors after unpersisting all cached data

2018-07-10 Thread Jeffrey Charles
Thanks for the suggestion. I gave it a try but the executor still isn't being released several minutes after running that. On Mon, Jul 9, 2018 at 3:51 PM Vadim Semenov wrote: > Try doing `unpersist(blocking=true)` > On Mon, Jul 9, 2018 at 2:59 PM Jeffrey Charles > wrote: > > > > I'm persisting

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-10 Thread Mark Hamstra
It's been done many times before by many organizations. Use Spark Job Server or Livy or create your own implementation of a similar long-running Spark Application. Creating a new Application for every Job is not the way to achieve low-latency performance. On Tue, Jul 10, 2018 at 4:18 AM wrote:

Re: Spark on Mesos - Weird behavior

2018-07-10 Thread Thodoris Zois
Actually after some experiments we figured out that spark.max.cores / spark.executor.cores is the upper bound for the executors. Spark apps will run even only if one executor can be launched. Is there any way to specify also the lower bound? It is a bit annoying that seems that we can’t

[Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread subramgr
Hi, Currently we are using HDFS for our checkpointing but we are having issues maintaining a HDFS cluster. We tried glusterfs in the past for checkpointing but in our setup glusterfs does not work well. We are evaluating using Cassandra for storing the checkpoint data. Has any one implemented

Re: Emit Custom metrics in Spark Structured Streaming job

2018-07-10 Thread subramgr
Hi I am working on implementing my idea but here is how it goes: 1. Use this library https://github.com/groupon/spark-metrics 2. Have a cron job which periodically curl /metrics/json endpoint at driver and all other nodes 3. Parse the response and send the data through a telegraf agent

[ANNOUNCE] Apache Spark 2.2.2

2018-07-10 Thread Tom Graves
We are happy to announce the availability of Spark 2.2.2! Apache Spark 2.2.2 is a maintenance release, based on the branch-2.2 maintenance branch of Spark. We strongly recommend all 2.2.x users to upgrade to this stable release. The release notes are available at 

Unpivoting

2018-07-10 Thread amin mohebbi
Does anyone know how to transpose the columns in Spark -scala ?  This is how I want to unpivot the table  : How to unpivot the table based on the multiple columns | | | | | | | | | | | How to unpivot the table based on the multiple columns I am using Scala and Spark to unpivot a

Re: Spark on Mesos - Weird behavior

2018-07-10 Thread Pavel Plotnikov
Hello Thodoris! Have you checked this: - does mesos cluster have available resources? - if spark have waiting tasks in queue more than spark.dynamicAllocation.schedulerBacklogTimeout configuration value? - And then, have you checked that mesos send offers to spark app mesos framework at least

Re: Emit Custom metrics in Spark Structured Streaming job

2018-07-10 Thread chandan prakash
Hi Subramanian, Did you find any solution for this ? I am looking for something similar too. Regards, Chandan On Wed, Jun 27, 2018 at 9:47 AM subramgr wrote: > I am planning to send these metrics to our KairosDB. Let me know if there > are > any examples that I can take a look > > > > -- >

Run STA/LTA python function using spark streaming: java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

2018-07-10 Thread zakhavan
Hello, I'm trying to run the sta/lta python code which I got it from obspy website using spark streaming and plot the events but I keep getting the following error! "java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute" Here is the code: