Hi all,
I am currently evaluating using Spark with Kudu.
So I am facing the following issues:
1) If you try to DELETE a row with a key that is not present on the table
you will have an Exception like this:
java.lang.RuntimeException: failed to write N rows from DataFrame to Kudu;
sample errors:
Hi all,
I have a series of doubts about CacheManager used by SQLContext to cache
DataFrame.
My use case requires different threads persisting/reading dataframes
cuncurrently. I realized using spark that persistence really does not work
in parallel mode.
I would like it if I'm persisting a data
Hi all,
I have a series of doubts about CacheManager used by SQLContext to cache
DataFrame.
My use case requires different threads persisting/reading dataframes
cuncurrently. I realized using spark that persistence really does not work
in parallel mode.
I would like it if I'm persisting a data
Hi all,
I have a spark application running to which I submit jobs continuosly.
These job use different instances of sqlContext. So the web ui of
application starts to fill up more and more with this instance.
Is there any way to prevent this? I don't want to see created sql context
in the web
Hi all,
Is there a way to prevent eviction of the RDD from SparkContext ?
I would not use the cache with its default behavior (LRU). I would
unpersist manually RDD cached in memory/disk.
Thanks in advance,
Pietro.
Questa e-mail รจ stata inviata da un computer privo di virus protetto da
Avast.
Hi all,
How can I do to use the "NOT IN" clause in Spark SQL 1.2 ??
He continues to give me syntax errors. But the question is correct in SQL.
Thanks in advance,
Best regards,
Pietro.
Hi all,
What is the best way to remotely debug, with breakpoints, spark apps?
Thanks in advance,
Best regards!
Pietro
Hi all,
what is the best way to perform Spark SQL queries and obtain the result
tuplas in a stremaing way. In particullar, I want to aggregate data and
obtain the first and incomplete results in a fast way. But it should be
updated until the aggregation be completed.
Best Regards.
Hi all,
I have to estimate resource requirements for my hadoop/spark cluster. In
particular, i have to query about 100tb of hbase table to do aggregation
with spark sql.
What is, approximately, the most suitable cluster configuration for my use
case? In order to query data in a fast way. At last
me how use Spark as Service in yarn-cluster mode??
Thanks in advance and Best Regards,
Pietro Gentile
Hi everyone,
I deployed Spark 1.1.0 and I m trying to use it with spark-job-server 0.4.0
(https://github.com/ooyala/spark-jobserver).
I previously used Spark 1.0.2 and had no problems with it. I want to use the
newer version of Spark (and Spark SQL) to create the SchemaRDD programmatically.
11 matches
Mail list logo