Hi,
This jira https://issues.apache.org/jira/browse/SPARK-8813 is fixed in spark
2.0.But resolution is not mentioned there.
In our use case, there are big as well as many small parquet files which are
being queried using spark sql.Can someone please explain what is the fix and
how I can use it
is enabled for a particular stage. |
| spark.speculation.multiplier | 1.5 | How many times slower a task is than the
median to be considered for speculation.
|
On Thursday, January 15, 2015 5:44 AM, Ajay Srivastava
a_k_srivast...@yahoo.com.INVALID wrote:
Hi,
My spark job is taking long
://spark.apache.org/docs/latest/tuning.html#serialized-rdd-storage
Cheers,- Nicos
On Jan 15, 2015, at 6:49 AM, Ajay Srivastava a_k_srivast...@yahoo.com.INVALID
wrote:
Thanks RK. I can turn on speculative execution but I am trying to find out
actual reason for delay as it happens on any node. Any idea about
Hi,
My spark job is taking long time. I see that some tasks are taking longer time
for same amount of data and shuffle read/write. What could be the possible
reasons for it ?
The thread-dump sometimes show that all the tasks in an executor are waiting
with following stack trace -
Executor task
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this.
Regards,Ajay
On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava
a_k_srivast...@yahoo.com.INVALID wrote:
Hi,I am trying to read a parquet file using -val parquetFile =
sqlContext.parquetFile(people.parquet
Hi,I am trying to read a parquet file using -val parquetFile =
sqlContext.parquetFile(people.parquet)
There is no way to specify that I am interested in reading only some columns
from disk. For example, If the parquet file has 10 columns and want to read
only 3 columns from disk.
We have done
Hi,
I did not find any videos on apache spark channel in youtube yet.
Any idea when these will be made available ?
Regards,
Ajay
Hi,
I was checking different storage level of an RDD and found OFF_HEAP.
Has anybody used this level ?
If i use this level, where will data be stored ? If not in heap, does it mean
that we can avoid GC ?
How can I use this level ? I did not find anything in archive regarding this.
Can someone
a patch for it here:
https://github.com/apache/spark/pull/986. Feel free to try that if you’d like;
it will also be in 0.9.2 and 1.0.1.
Matei
On Jun 5, 2014, at 12:19 AM, Ajay Srivastava a_k_srivast...@yahoo.com wrote:
Sorry for replying late. It was night here.
Lian/Matei,
Here is the code
On Jun 4, 2014, at 12:58 PM, Xu (Simon) Chen xche...@gmail.com wrote:
Maybe your two workers have different assembly jar files?
I just ran into a similar problem that my spark-shell is using a different jar
file than my workers - got really confusing results.
On Jun 4, 2014 8:33 AM, Ajay
Hi,
I am doing join of two RDDs which giving different results ( counting number of
records ) each time I run this code on same input.
The input files are large enough to be divided in two splits. When the program
runs on two workers with single core assigned to these, output is consistent
11 matches
Mail list logo