hello,Why spark usually off-heap oom when shuffle reader? I read some source
code , When a ResultTask read shuffle data from no-local executor,it has buffer
and spill disk,so why still off-heap oom?
jib...@qq.com
Depending on the Alluxio version you are running, e..g, for 2.0, the
metrics of the local short-circuit read is not turned on by default.
So I would suggest you to first turn on the metrics collecting local
short-circuit reads by setting
alluxio.user.metrics.collection.enabled=true
Regarding the
Hi Mark,
You can follow the instructions here:
https://docs.alluxio.io/os/user/stable/en/compute/Spark.html#customize-alluxio-user-properties-for-individual-spark-jobs
Something like this:
$ spark-submit \--conf
'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH'
Hello,
I have 2 parquets (each containing 1 file):
- parquet-wide - schema has 25 top level cols + 1 array
- parquet-narrow - schema has 3 top level cols
Both files have same data for given columns.
When I read from parquet-wide spark reports* read 52.6 KB*, from
parquet-narrow *only 2.6
I am also interested. Many of the docs/books that I've seen are
practical/examples about usage rather than deep internals of Spark.
On Wed, 18 Sep 2019 21:12:12 -1100 vipul.s.p...@gmail.com wrote
Yes,
I realize what you were looking for, I am also looking for the same docs.
Hi,
Consider the following statements:
1)
> scala> val df = spark.read.format("com.shubham.MyDataSource").load
> scala> df.show
> +---+---+
> | i| j|
> +---+---+
> | 0| 0|
> | 1| -1|
> | 2| -2|
> | 3| -3|
> | 4| -4|
> +---+---+
> 2)
> scala> val df1 = df.filter("i < 3")
> scala> df1.show
Hi,
How can I create an initial state by hands so that structured streaming
files source only reads data which is semantically (i.e. using a file path
lexicographically) greater than the minimum committed initial state?
Details here:
Yes,
I realize what you were looking for, I am also looking for the same docs.
Haven't found em yet. Also, jacek laskowski's gitbooks are the next best
thing to follow. If you haven't yet.
Regards
On Thu, Sep 19, 2019 at 12:46 PM wrote:
> Thanks Vipul,
>
>
>
> I was looking specifically for
Thanks Vipul,
I was looking specifically for documents spark committer use for reference.
Currently I’ve put custom logs in spark-core sources then building and running
jobs on it.
Form printed logs I try to understand execution flows.
From: Vipul Rajan
Sent: Thursday, September 19, 2019
https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/2-JobLogicalPlan.md
This is pretty old. but it might help a little bit. I myself am going
through the source code and trying to reverse engineer stuff. Let me know
if you'd like to pool resources sometime.
Regards
On Thu, Sep
Hi ,
Can someone provide documents/links (apart from official documentation) for
understanding internal workings of spark-core,
Document containing components pseudo codes, class diagrams, execution flows ,
etc.
Thanks, Kamal
"Confidentiality Warning: This message and any attachments are
12 matches
Mail list logo