I found answer regarding logging in the JavaDoc of SparkLauncher:
"Currently, all applications are launched as child processes. The child's
stdout and stderr are merged and written to a logger (see
java.util.logging)."
One last question. sparkAppHandle.getAppId() - does this function
return
Thanks, Marcelo.
One more question regarding getting logs.
In previous implementation of SparkLauncer we could read logs from :
sparkLauncher.getInputStream()
sparkLauncher.getErrorStream()
What is the recommended way of getting logs and logging of Spark execution
while using
Ah, looks similar. Next opportunity I get, I'm going to do a printSchema on
the two datasets and see if they don't match up.
I assume that unioning the underlying RDDs doesn't run into this problem
because of less type checking or something along those lines?
On Fri, Oct 21, 2016 at 3:39 PM
groupBy always materializes the entire group (on disk or in memory) which
is why you should avoid it for large groups.
The key is to never materialize the grouped and shuffled data.
To see one approach to do this take a look at
https://github.com/tresata/spark-sorted
It's basically a
I do not use rename, and the files are written to, and then moved to a
directory on HDFS in gz format.
On 22 October 2016 at 15:14, Steve Loughran wrote:
>
> > On 21 Oct 2016, at 15:53, Nkechi Achara wrote:
> >
> > Hi,
> >
> > I am using Spark
Thank you for the reply tosaurabh85. We do tune and adjust our shuffle
partition count, but that was not influencing the reading of parquets (the
data is not shuffled as it is read as I understand it).
I apologize that I actually received an answer, but it was not caught on the
mailing list
On 22 Oct 2016, at 00:48, Chetan Khatri
> wrote:
Hello Cheng,
Thank you for response.
I am using spark 1.6.1, i am writing around 350 gz parquet part files for
single table. Processed around 180 GB of Data using Spark.
Are you writing
> On 21 Oct 2016, at 15:53, Nkechi Achara wrote:
>
> Hi,
>
> I am using Spark 1.5.0 to read gz files with textFileStream, but when new
> files are dropped in the specified directory. I know this is only the case
> with gz files as when i extract the file into the
I have a spark optimization query that I have posted on StackOverflow, any
guidance on this would be appreciated.
Please follow the link below, I have explained the problem in depth here
with code.
Hi,
I have a query regarding spark stage optimization. I have asked the
question in more detail at Stackoverflow, please find the following link:
http://stackoverflow.com/questions/40192302/why-is-
that-two-stages-in-apache-spark-are-computing-same-thing
10 matches
Mail list logo