spark kudu issues

2018-06-20 Thread Pietro Gentile
Hi all, I am currently evaluating using Spark with Kudu. So I am facing the following issues: 1) If you try to DELETE a row with a key that is not present on the table you will have an Exception like this: java.lang.RuntimeException: failed to write N rows from DataFrame to Kudu; sample errors:

Fwd: Spark CacheManager Thread-safety

2016-05-20 Thread Pietro Gentile
Hi all, I have a series of doubts about CacheManager used by SQLContext to cache DataFrame. My use case requires different threads persisting/reading dataframes cuncurrently. I realized using spark that persistence really does not work in parallel mode. I would like it if I'm persisting a data

Spark CacheManager Thread-safety

2016-05-20 Thread Pietro Gentile
Hi all, I have a series of doubts about CacheManager used by SQLContext to cache DataFrame. My use case requires different threads persisting/reading dataframes cuncurrently. I realized using spark that persistence really does not work in parallel mode. I would like it if I'm persisting a data

Spark Web UI issue

2016-05-06 Thread Pietro Gentile
Hi all, I have a spark application running to which I submit jobs continuosly. These job use different instances of sqlContext. So the web ui of application starts to fill up more and more with this instance. Is there any way to prevent this? I don't want to see created sql context in the web

Spark Cache Eviction

2016-02-22 Thread Pietro Gentile
Hi all, Is there a way to prevent eviction of the RDD from SparkContext ? I would not use the cache with its default behavior (LRU). I would unpersist manually RDD cached in memory/disk. Thanks in advance, Pietro. Questa e-mail รจ stata inviata da un computer privo di virus protetto da Avast.

NOT IN in Spark SQL

2015-09-03 Thread Pietro Gentile
Hi all, How can I do to use the "NOT IN" clause in Spark SQL 1.2 ?? He continues to give me syntax errors. But the question is correct in SQL. Thanks in advance, Best regards, Pietro.

SPARK REMOTE DEBUG

2015-06-29 Thread Pietro Gentile
Hi all, What is the best way to remotely debug, with breakpoints, spark apps? Thanks in advance, Best regards! Pietro

Spark SQL and Streaming Results

2015-06-05 Thread Pietro Gentile
Hi all, what is the best way to perform Spark SQL queries and obtain the result tuplas in a stremaing way. In particullar, I want to aggregate data and obtain the first and incomplete results in a fast way. But it should be updated until the aggregation be completed. Best Regards.

Hardware provisioning for Spark SQl

2015-04-29 Thread Pietro Gentile
Hi all, I have to estimate resource requirements for my hadoop/spark cluster. In particular, i have to query about 100tb of hbase table to do aggregation with spark sql. What is, approximately, the most suitable cluster configuration for my use case? In order to query data in a fast way. At last

[spark-jobserver] Submit Job in yarn-cluster mode (?)

2015-01-14 Thread Pietro Gentile
me how use Spark as Service in yarn-cluster mode?? Thanks in advance and Best Regards, Pietro Gentile

Spark 1.1.0 and HBase: Snappy UnsatisfiedLinkError

2014-11-25 Thread Pietro Gentile
Hi everyone, I deployed Spark 1.1.0 and I m trying to use it with spark-job-server 0.4.0 (https://github.com/ooyala/spark-jobserver). I previously used Spark 1.0.2 and had no problems with it. I want to use the newer version of Spark (and Spark SQL) to create the SchemaRDD programmatically.