> data for some queries and not others?
>
> It sounds like an interesting problem…
>
> On Jun 23, 2016, at 5:21 AM, Prabhu Joseph <prabhujose.ga...@gmail.com>
> wrote:
>
> Hi All,
>
>On submitting 20 parallel same SQL query to Spark Thrift Server, the
> qu
rrency is affected
by Single Driver. How to improve the concurrency and what are the best
practices.
Thanks,
Prabhu Joseph
,
Prabhu Joseph
ted. Meanwhile, there were 41405 Tasks in the the
> 163 Stages that were skipped.
>
> I think -- but the Spark UI's accounting may not be 100% accurate and bug
> free.
>
> On Tue, Mar 15, 2016 at 6:34 PM, Prabhu Joseph <prabhujose.ga...@gmail.com
> > wrote:
>
>> Okay
Stages and Tasks below.
>>>
>>> Job_IDDescriptionSubmittedDuration
>>> Stages (Succeeded/Total)Tasks (for all stages): Succeeded/Total
>>>
>>> 11 count 2016/03/14 15:35:32 1.4
>>> min 164/164 * (163 skipped) *19841/19788
>>> *(41405 skipped)*
>>> Thanks,
>>> Prabhu Joseph
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
/14 15:35:32 1.4 min
164/164 * (163 skipped) *19841/19788
*(41405 skipped)*
Thanks,
Prabhu Joseph
in pyspark script.
DEFAULT_PYTHON="/ANACONDA/anaconda2/bin/python2.7"
Thanks,
Prabhu Joseph
On Tue, Mar 15, 2016 at 11:52 AM, Stuti Awasthi <stutiawas...@hcl.com>
wrote:
> Hi All,
>
>
>
> I have a Centos cluster (without any sudo permissions) which has by
&
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 March 2016 at 08:06, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> Which version of Spark are you using? The configuration varies by version.
>>
>> Regards
>> Sab
ply swap
> the fractions in your case.
>
> Regards
> Sab
>
> On Mon, Mar 14, 2016 at 2:20 PM, Prabhu Joseph <prabhujose.ga...@gmail.com
> > wrote:
>
>> It is a Spark-SQL and the version used is Spark-1.2.1.
>>
>> On Mon, Mar 14, 2016 at 2:16 PM, Sabarish
gt;>
>>
>>
>> On 14 March 2016 at 08:06, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> Which version of Spark are you using? The configuration varies by
>>> version.
>>>
>>> Regards
>>>
for cache. So, when a Spark Executor has lot of memory
available
for cache and does not use the cache but when there is a need to do lot of
shuffle, will executors only use the shuffle fraction which is set for
doing shuffle or will it use
the free memory available for cache as well.
Thanks,
Prabhu Joseph
Looking at ExternalSorter.scala line 192, i suspect some input record has
Null key.
189 while (records.hasNext) {
190addElementsRead()
191kv = records.next()
192map.changeValue((getPartition(kv._1), kv._1), update)
On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph <prabhujose
Looking at ExternalSorter.scala line 192
189
while (records.hasNext) { addElementsRead() kv = records.next()
map.changeValue((getPartition(kv._1), kv._1), update)
maybeSpillCollection(usingMap = true) }
On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru
wrote:
> I am seeing
,
Prabhu Joseph
On Fri, Mar 11, 2016 at 3:45 AM, Ashok Kumar <ashok34...@yahoo.com.invalid>
wrote:
>
> Hi,
>
> We intend to use 5 servers which will be utilized for building Bigdata
> Hadoop data warehouse system (not using any propriety distribution like
> Hortonworks or Cl
just want to be able to replicate hot cached blocks right?
>
>
> On Tuesday, March 8, 2016, Prabhu Joseph <prabhujose.ga...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> When a Spark Job is running, and one of the Spark Executor on Node A
>> has some partitio
shuffle files from an external service instead of
from each other which will offload the load on Spark Executors.
We want to check whether a similar thing of an External Service is
implemented for transferring the cached partition to other executors.
Thanks, Prabhu Joseph
Hi All,
What is the difference between Spark Partitioner and Spark Shuffle
Manager. Spark Partitioner is by default Hash partitioner and Spark shuffle
manager is sort based, others are Hash, Tunsten Sort.
Thanks,
Prabhu Joseph
= {
val pieces = line.split(' ')
val level = pieces(2).toString
val one = pieces(0).toString
val two = pieces(1).toString
(level,LogClass(one,two))
}
val output = logData.map(x => parse(x))
*val partitioned = output.partitionBy(new ExactPartitioner(5)).persist()val
groups = partitioned.groupByKey(new ExactPartitioner(5))*
groups.count()
output.partitions.size
partitioned.partitions.size
}
}
Thanks,
Prabhu Joseph
Is all NodeManager services restarted after the change in yarn-site.xml
On Thu, Mar 3, 2016 at 6:00 AM, Jeff Zhang wrote:
> The executor may fail to start. You need to check the executor logs, if
> there's no executor log then you need to check node manager log.
>
> On Wed,
Hi All,
I am trying to add DEBUG for Spark ApplicationMaster for it is not working.
On running Spark job, passed
-Dlog4j.configuration=file:/opt/mapr/spark/spark-1.4.1/conf/log4j.properties
The log4j.properties has log4j.rootCategory=DEBUG, console
Spark Executor Containers has DEBUG logs but
Matthias,
Can you check appending the jars in LAUNCH_CLASSPATH of
spark-1.4.1/sbin/spark_class
2016-03-02 21:39 GMT+05:30 Matthias Niehoff :
> no, not to driver and executor but to the master and worker instances of
> the spark standalone cluster
>
> Am 2.
is java old threading is used somewhere.
On Friday, February 19, 2016, Jörn Franke <jornfra...@gmail.com> wrote:
> How did you configure YARN queues? What scheduler? Preemption ?
>
> > On 19 Feb 2016, at 06:51, Prabhu Joseph <prabhujose.ga...@gmail.com
> <javascrip
are taking 2-3 times longer than A,
which shows concurrency does not improve with shared Spark Context. [Spark
Job Server]
Thanks,
Prabhu Joseph
he.spark.sql.hive.HiveContext]
>
> res0: Boolean = true
>
>
>
> On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com
> > wrote:
>
>> Hi All,
>>
>> On creating HiveContext in spark-shell, fails with
>>
>> Caused by:
ache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]
So is there any real need for HiveContext inside Spark Shell. Is everything
that can be done with HiveContext, achievable with SqlContext inside Spark
Shell.
Thanks,
Prabhu Joseph
SPARK_MASTER_IP at
worker nodes.
Check the logs of other workers running to see what SPARK_MASTER_IP it
has connected, I don't think it is using a wrong Master IP.
Thanks,
Prabhu Joseph
On Mon, Feb 15, 2016 at 12:34 PM, Kartik Mathur <kar...@bluedata.com> wrote:
> Thanks Prabhu ,
>
>
in Worker nodes are
exactly the same as what Spark Master GUI shows.
Thanks,
Prabhu Joseph
On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur <kar...@bluedata.com> wrote:
> on spark 1.5.2
> I have a spark standalone cluster with 6 workers , I left the cluster idle
> for 3 days and aft
in hadoop-2.5.1 and hence
spark.yarn.dist.files does not work with hadoop-2.5.1,
spark.yarn.dist.files works fine on hadoop-2.7.0, as CWD/* is included in
container classpath through some bug fix. Searching for the JIRA.
Thanks,
Prabhu Joseph
On Wed, Feb 10, 2016 at 4:04 PM, Ted Yu <yuz
of hbase
client jars, when i checked launch container.sh , Classpath does not have
$PWD/* and hence all the hbase client jars are ignored.
Is spark.yarn.dist.files not for adding jars into the executor classpath.
Thanks,
Prabhu Joseph
On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose
://issues.apache.org/jira/browse/SPARK-5342
spark.yarn.credentials.file
How to renew the AMRMToken for a long running job on YARN?
Thanks,
Prabhu Joseph
+ Spark-Dev
On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:
> Hi All,
>
> A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread
up and launching it on a
less-local node.
So after making it 0, all tasks started parallel. But learned that it is
better not to reduce it to 0.
On Mon, Feb 1, 2016 at 2:02 PM, Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:
> Hi All,
>
>
> Sample Spark application which re
then programming
> must be the process of putting ..."
> - Edsger Dijkstra
>
> "If you pay peanuts you get monkeys"
>
>
> 2016-02-04 11:33 GMT+01:00 Prabhu Joseph <prabhujose.ga...@gmail.com>:
>
>> Okay, the reason for the task delay within executor
does not have enough heap.
Thanks,
Prabhu Joseph
On Thu, Feb 4, 2016 at 11:25 AM, fightf...@163.com <fightf...@163.com>
wrote:
> Hi,
>
> I want to make sure that the cache table indeed would accelerate sql
> queries. Here is one of my use case :
> impala table size : 24.5
, saveAsHadoopFile runs fine.
What could be the reason for ExecutorLostFailure failing when cores per
executor is high.
Error: ExecutorLostFailure (executor 3 lost)
16/02/02 04:22:40 WARN TaskSetManager: Lost task 1.3 in stage 15.0 (TID
1318, hdnprd-c01-r01-14):
Thanks,
Prabhu Joseph
2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now RUNNING
....
Thanks,
Prabhu Joseph
Thanks Ted. My concern is how to avoid these kind of user errors on a
production cluster, it would be better if Spark handles this instead of
creating an Executor for every second and fails and overloading the Spark
Master. Shall i report a Spark JIRA to handle this.
Thanks,
Prabhu Joseph
application attempt, there are many
finishApplicationMaster request causing the ERROR.
Need your help to understand on what scenario the above happens.
JIRA's related are
https://issues.apache.org/jira/browse/SPARK-1032
https://issues.apache.org/jira/browse/SPARK-3072
Thanks,
Prabhu Joseph
machine and jps -l will list all java
processes, jstack -l will give the stack trace.
Thanks,
Prabhu Joseph
On Mon, Jan 11, 2016 at 7:56 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
> Hi Prabhu thanks for the response. How do I find pid of a slow running
> task. Task is running in
for every 2 seconds and total 1
minute. This will help to identify the code where threads are spending lot
of time and then try to tune.
Thanks,
Prabhu Joseph
On Sat, Jan 2, 2016 at 1:28 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
> Hi thanks I did that and I have attached thread du
Take thread dump of Executor process several times in a short time period
and check what each threads are doing at different times which will help to
identify the expensive sections in user code.
Thanks,
Prabhu Joseph
On Sat, Jan 2, 2016 at 3:28 AM, unk1102 <umesh.ka...@gmail.com>
41 matches
Mail list logo