Limit on the number of Jobs per Application

2018-05-30 Thread Jeremy Davis
I have an application that does many thousand univariate GLM regressions that seems to break down after completing around 25K jobs. Plenty of resources: disk, network, memory, CPU are free, but eventually it is only scheduling on a few threads (out of 400+ possible on the cluster) No task

Re: help needed in perforance improvement of spark structured streaming

2018-05-30 Thread amit kumar singh
hi team any help with this I have a use case where i need to call stored procedure through structured streaming. I am able to send kafka message and call stored procedure , but since foreach sink keeps on executing stored procedure per message i want to combine all the messages in single

How can we group by messages coming in per batch of structured streaming

2018-05-30 Thread amit kumar singh
Hi Team, I have a requirement where i need to to combine all json messages coming in batch of structured streaming into one single json messages which can be separated by comma or any other delimiter and store it i have tried to group by kafka partition i tried using concat but its not working

Re: trying to understand structured streaming aggregation with watermark and append outputmode

2018-05-30 Thread Koert Kuipers
thanks, thats helpful. On Wed, May 30, 2018 at 5:05 PM, Lalwani, Jayesh < jayesh.lalw...@capitalone.com> wrote: > Few things > > > >1. Append mode is going to output data that falls out of the watermark >2. Structured streaming isn’t time based. It reacts only when it sees >input

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

2018-05-30 Thread naresh Goud
What are you doing? Give more details o what are you doing On Wed, May 30, 2018 at 12:58 PM Arun Hive wrote: > > Hi > > While running my spark job component i am getting the following exception. > Requesting for your help on this: > Spark core version - > spark-core_2.10-2.1.1 > > Spark

Re: trying to understand structured streaming aggregation with watermark and append outputmode

2018-05-30 Thread Lalwani, Jayesh
Few things 1. Append mode is going to output data that falls out of the watermark 2. Structured streaming isn’t time based. It reacts only when it sees input data. If no data appears in the input it will not move the aggregation window 3. Clock time is irrelevant to structured

Re: Data is not getting written in sorted format on target oracle table through SPARK

2018-05-30 Thread Lalwani, Jayesh
No. There is no way to control the order except for the option that you have already tried (repartition =1). When you are inserting in parallel from multiple nodes, then the order of inserts cannot be guaranteed. That is because of the very nature of doing things in parallel. The only way order

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

2018-05-30 Thread Arun Hive
Hi  While running my spark job component i am getting the following exception. Requesting for your help on this:Spark core version - spark-core_2.10-2.1.1 Spark streaming version -spark-streaming_2.10-2.1.1 Spark hive version -spark-hive_2.10-2.1.1 2018-05-28 00:08:04,317  

Closing IPC connection

2018-05-30 Thread Arun Hive
Hi, While running my spark job component i am getting the following exception. Requesting for your help on this:Spark core version - spark-core_2.10-2.1.1 Spark streaming version -spark-streaming_2.10-2.1.1 Spark hive version -spark-hive_2.10-2.1.1 b-executor-0] DEBUG (Client.java:428) - The

Error while creating table with space with /wihout partition

2018-05-30 Thread abhijeet bedagkar
I am facing a weird situation wherein the insert overwrite query does not give any error on being executed against a table which contains a column with a space in its name. Following are the queries which give no error: CREATE TABLE TEST_PART (`col1 ` STRING) PARTITIONED BY (`col2` STRING)

Thrift server not exposing temp tables (spark.sql.hive.thriftServer.singleSession=true)

2018-05-30 Thread Daniel Haviv
Hi, I would like to expose a DF through the Thrift server, but even though I enable spark.sql.hive.thriftServer.singleSession I still can't see the temp table. I'm using Spark 2.2.0: spark-shell --conf spark.sql.hive.thriftServer.singleSession=true import

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Jeff Frylings
The logs are not the problem; it is the shuffle files that are not being cleaned up. We do have the configs for log rolling and that is working just fine. ex: /mnt/blockmgr-d65d4a74-d59a-4a06-af93-ba29232f7c5b/31/shuffle_1_46_0.data > On May 30, 2018, at 9:54 AM, Ajay wrote: > > I have used

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Ajay
I have used these configs in the paths to clean up the executor logs. .set("spark.executor.logs.rolling.time.interval", "minutely") .set("spark.executor.logs.rolling.strategy", "time") .set("spark.executor.logs.rolling.maxRetainedFiles", "1") On Wed, May 30, 2018 at 8:49 AM

Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Jeff Frylings
Intermittently on spark executors we are seeing blockmgr directories not being cleaned up after execution and is filling up disk. These executors are using Mesos dynamic resource allocation and no single app using an executor seems to be the culprit. Sometimes an app will run and be cleaned

Apache Spark is not working as expected

2018-05-30 Thread remil
hadoopuser@sherin-VirtualBox:/usr/lib/spark/bin$ spark-shell spark-shell: command not found hadoopuser@sherin-VirtualBox:/usr/lib/spark/bin$ Spark.odt -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: testing frameworks

2018-05-30 Thread Holden Karau
So Jessie has an excellent blog post on how to use it with Java applications - http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/ On Wed, May 30, 2018 at 4:14 AM Spico Florin wrote: > Hello! > I'm also looking for unit testing spark Java application. I've seen the > great

Re: testing frameworks

2018-05-30 Thread Spico Florin
Hello! I'm also looking for unit testing spark Java application. I've seen the great work done in spark-testing-base but it seemed to me that I could not use for Spark Java applications. Only spark scala applications are supported? Thanks. Regards, Florin On Wed, May 23, 2018 at 8:07 AM,

Re: Positive log-likelihood with Gaussian mixture

2018-05-30 Thread Simon Dirmeier
I see, thanks for clearning that up. I was aware of the fact for uniform distributions, but not for normal ones. So that would mean, some of the components have such a small variance that the loglik is positive in the end? Cheers, Simon Am 30.05.18 um 11:22 schrieb robin.e...@xense.co.uk:

Data is not getting written in sorted format on target oracle table through SPARK

2018-05-30 Thread abhijeet bedagkar
Hi, I have a table in hive with below schema emp_id:int emp_name:string I have created data frame from above hive table df = sql_context.sql('SELECT * FROM employee ORDER by emp_id') df.show() After above code is run I see that data is sorted properly on emp_id After this I am trying to write

Re: Positive log-likelihood with Gaussian mixture

2018-05-30 Thread robin . east
Positive log likelihoods for continuous distributions are not unusual. You are evaluating a pdf not a probability. For example a univariate Gaussian pdf returns greater than 1 at the mean when the variance goes below 0.39, at which point the log pdf is positive. Sent from Polymail (

Re: Datafarme save as table operation is failing when the child columns name contains special characters

2018-05-30 Thread abhijeet bedagkar
I further dig down into this issue and 1. Seems like this issue originates from hive meta-store since when tried to execute query with sub-column containing special characters and despite adding backtick it did not work for me 2. I solved this issue by explicitly passing SQL expression to the data