Hi
This *java.nio.channels.ClosedChannelException* is often caused by a
connection timeout
between your Spark executors and Alluxio workers.
One simple and quick fix is to increase the timeout value to be larger
alluxio.user.network.netty.timeout
hi Andy
Assuming you are running Spark with YARN, then I would recommend deploying
Alluxio in the same YARN cluster if you are looking for best performance.
Alluxio can also be deployed separated as a standalone service, but in that
case, you may need to transfer data from Alluxio cluster to your
Let's suppose we're dealing with a non-secured (i.e. not Kerberized)
YARN cluster. When I invoke spark-submit, is there a practical
difference between specifying --proxy-user=foo (supposing
impersonation is properly set up) or setting the environment variable
HADOOP_USER_NAME=foo? Thanks for any
You can check out
https://github.com/hortonworks-spark/spark-atlas-connector/
On Wed, 15 May 2019 at 19:44, lk_spark wrote:
> hi,all:
> When I use spark , if I run some SQL to do ETL how can I get
> lineage info. I found that , CDH spark have some config about lineage :
>
Hi everyone.
I am doing my master thesis in the topic of Automatic parameter tuning of
graph processing frameworks. Now, we are aiming to optimize GraphX jobs. I
have an initial list of parameters which we would like to tune:
spark.memory.fraction
spark.executor.memory
spark.shuffle.compress
One of the reason that any jobs running on YARN (Spark, MR, Hive, etc) can
get stuck is if there is data unavailability issue with HDFS.
This can arise if either the Namenode is not reachable or if the particular
data block is unavailable due to node failures.
Can you check if your YARN service
Hi All,
I am getting GC overhead while reading a table from HIVE from spark like:
spark.sql("SELECT * FROM some.table where date='2019-05-14' LIMIT
> 10").show()
So when I run above command in spark-shell then it starts processing *1780
tasks* where it goes OOM at a specific partition.
1.
You are looking at the digram without looking at the underlying request.
The behavior of state collection is dependent on the request and the output
mode of the query.
In the example you cite
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", )
Hi,
spark.lineage.enabled is Cloudera specific and doesn't work with vanilla
Spark.
BR,
G
On Thu, May 16, 2019 at 4:44 AM lk_spark wrote:
> hi,all:
> When I use spark , if I run some SQL to do ETL how can I get
> lineage info. I found that , CDH spark have some config about lineage :
on yarn
On Thu, May 16, 2019 at 1:36 AM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi Rishi,
>
> Are you running spark on YARN or spark's master-slave cluster?
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Thu, May 16, 2019 at 7:15 AM Rishi Shah
> wrote:
>
>> Any one please?
>>
Thanks Ayan, I wasn't aware of such user group specifically for databricks.
Thanks for the input, much appreciated!
On Wed, May 15, 2019 at 10:07 PM ayan guha wrote:
> Well its a databricks question so better be asked in their forum.
>
> You can set up cluster level params when you create new
Tagging mail to hopefully get a quicker response
On Thu 16 May, 2019, 3:08 PM Sheel Pancholi, wrote:
> Hello,
>
> Along with what I sent before, I want to add that I went over the
> documentation at
> https://github.com/apache/spark/blob/master/docs/structured-streaming-programming-guide.md
>
>
Hello,
Along with what I sent before, I want to add that I went over the
documentation at
https://github.com/apache/spark/blob/master/docs/structured-streaming-programming-guide.md
Here is an excerpt:
[image: Model]
>
Hello Russell,
Thanks for clarifying. I went over the Catalyst Optimizer Deep Dive video
at https://www.youtube.com/watch?v=RmUn5vHlevc and that along with your
explanation made me realize that the the DataFrame is the new DStream in
Structured Streaming. If my understanding is correct, request
14 matches
Mail list logo