Re: OFFICIAL USA REPORT TODAY India Most Dangerous : USA Religious Freedom Report out TODAY

2020-04-29 Thread akshay naidu
Today, entire Indian nationals are mourning for the demise of Irfan Khan, A
true Indian Muslim. And this idiot Zahid Amin or whoever created this
bot(not sure if its a bot or something)  spreading rumors about India.
Rights given to Muslims in India are much open then any other Muslim
majority country. Well it's not this guys fault entirely. He's just an
idiot who's been brainwashed as a kid by the kind of people who run camps
against humanity. These idiots are made into believe that by doing such
nonsense stuff , 72 *hoor *will welcome them after death.. bullsh*t.
And it's because of your kind the other honest and real Muslim suffers not
just in India but in every part of the world.

MODERATOR , PLEASE SPAM THIS ACCOUNT.

On Wed, Apr 29, 2020 at 12:38 PM Zahid Amin  wrote:

> EVIL PROSPERS ONLY WHEN GOOD MEN DO NOTHING.
>
> I have done some good today. I unveiled Evil.
>
> FACT:
>  10 million Kasmiris Muslim and Chinese on Lockdown since August 2019.
>
> FACT : Citizen Amendment Bill
> Cast out non Hindu from India Beginning with Muslims.
>
> FACT:  OFFICIAL USA Report Religious Freedom :Recognition of those two
> Facts. India most Dangerous country for ethnic minorities.
>
> FACT  Pakistan created in 1947 for Muslims and ethnic minorities to live
> separate.
>
> FACT:  you Indians in IT industry are all brahmin etc. The purists the
> Hindutwa.
>
>
> FACT: My tribe fought in the 1857 Indian Rebellion and I am my next five
> generations without home.
>
>
>
>
>
>
>
>
>
>
>
> Sent: Wednesday, April 29, 2020 at 8:32 AM
> > From: "Deepak Sharma" 
> > To: "Gaurav Agarwal" 
> > Cc: "Zahid Amin" , "user" 
> > Subject: Re: OFFICIAL USA REPORT TODAY India Most Dangerous : USA
> Religious Freedom Report out TODAY
> >
> > I am unsubscribing until these hatemongers like Zahid Amin are removed or
> > blocked .
> > FYI Zahid Amin , Indian govt rejected the false report already .
> >
> >
> >
> > On Wed, 29 Apr 2020 at 11:58 AM, Gaurav Agarwal 
> > wrote:
> >
> > > Spark moderator supress this user please. Unnecessary Spam or apache
> spark
> > > account is hacked ?
> > >
> > > On Wed, Apr 29, 2020, 11:56 AM Zahid Amin  wrote:
> > >
> > >> How can it be rumours   ?
> > >> Of course you want  to suppress me.
> > >> Suppress USA official Report out TODAY .
> > >>
> > >> > Sent: Wednesday, April 29, 2020 at 8:17 AM
> > >> > From: "Deepak Sharma" 
> > >> > To: "Zahid Amin" 
> > >> > Cc: user@spark.apache.org
> > >> > Subject: Re: India Most Dangerous : USA Religious Freedom Report
> > >> >
> > >> > Can someone block this email ?
> > >> > He is spreading rumours and spamming.
> > >> >
> > >> > On Wed, 29 Apr 2020 at 11:46 AM, Zahid Amin 
> > >> wrote:
> > >> >
> > >> > > USA report states that India is now the most dangerous country for
> > >> Ethnic
> > >> > > Minorities.
> > >> > >
> > >> > > Remember Martin Luther King.
> > >> > >
> > >> > >
> > >> > >
> > >>
> https://www.mail.com/int/news/us/9880960-religious-freedom-watchdog-pitches-adding-india-to.html#.1258-stage-set1-3
> > >> > >
> > >> > > It began with Kasmir and still in locked down Since August 2019.
> > >> > >
> > >> > > The Hindutwa  want to eradicate all minorities .
> > >> > > The Apache foundation is infested with these Hindutwa purists and
> > >> their
> > >> > > sympathisers.
> > >> > > Making Sure all Muslims are kept away from IT industry. Using you
> to
> > >> help
> > >> > > them.
> > >> > >
> > >> > > Those people in IT you deal with are purists yet you are not
> welcome
> > >> India.
> > >> > >
> > >> > > The recognition of  Hindutwa led to the creation of Pakistan in
> 1947.
> > >> > >
> > >> > > Evil propers when good men do nothing.
> > >> > > The genocide is not coming . It is Here.
> > >> > > I ask you please think and act.
> > >> > > Protect the Muslims from Indian Continent.
> > >> > >
> > >> > >
> -
> > >> > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> > >> > >
> > >> > > --
> > >> > Thanks
> > >> > Deepak
> > >> > www.bigdatabig.com
> > >> > www.keosha.net
> > >> >
> > >>
> > >> -
> > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> > >>
> > >> --
> > Thanks
> > Deepak
> > www.bigdatabig.com
> > www.keosha.net
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-30 Thread akshay naidu
how about Python.
java vs scala vs python vs R
which is better.

On Sat, Oct 27, 2018 at 3:34 AM karan alang  wrote:

> Hello
> - is there a "performance" difference when using Java or Scala for Apache
> Spark ?
>
> I understand, there are other obvious differences (less code with scala,
> easier to focus on logic etc),
> but wrt performance - i think there would not be much of a difference
> since both of them are JVM based,
> pls. let me know if this is not the case.
>
> thanks!
>


Re: [Spark Optimization] Why is one node getting all the pressure?

2018-06-11 Thread akshay naidu
try
 --num-executors 3 --executor-cores 4 --executor-memory 2G --conf
spark.scheduler.mode=FAIR

On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu 
wrote:

> Hi,
>
> I have submitted a job on* 4 node cluster*, where I see, most of the
> operations happening at one of the worker nodes and other two are simply
> chilling out.
>
> Picture below puts light on that -
>
> How to properly distribute the load?
>
> My cluster conf (4 node cluster [1 driver; 3 slaves]) -
>
> *Cores - 6*
> *RAM - 12 GB*
> *HDD - 60 GB*
>
> My Spark Submit command is as follows -
>
> *spark-submit --master spark://192.168.49.37:7077
>  --num-executors 3 --executor-cores 5
> --executor-memory 4G /appdata/bblite-codebase/prima_diabetes_indians.py*
>
> What to do?
>
> Thanks,
> Aakash.
>


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread akshay naidu
Putting all cores won't solve the purpose alone, you'll have to mention
executors as well executor memory accordingly to it..

On Tue 27 Feb, 2018, 12:15 AM Vadim Semenov,  wrote:

> All used cores aren't getting reported correctly in EMR, and YARN itself
> has no control over it, so whatever you put in `spark.executor.cores` will
> be used,
> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>
> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman  wrote:
>
>> Hi,
>>
>> spark version - 2.0.0
>> spark distribution - EMR 5.0.0
>>
>> Spark Cluster - one master, 5 slaves
>>
>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>
>>
>> Cluster Metrics
>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
>> NodesRebooted
>> Nodes
>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>  0
>>  0
>>  5
>>  0
>>  0
>> 
>> I have submitted job with below configuration
>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>
>>
>>
>> spark.task.cpus - be default 1
>>
>>
>> My understanding is there will be 5 executore each can run 10 task at a
>> time and task can share total memory of 20g. Here, i could see only 5
>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>> 10 core(number of threads), 1 Vcore(cpu).
>>
>> please correct me if my understand is wrong.
>>
>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>> performance?
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>


Re: sqoop import job not working when spark thrift server is running.

2018-02-24 Thread akshay naidu
Thanks Jörn,

Fairscheduler is already enabled in yarn-site.xml

yarn.resourcemanager.scheduler.class -
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

yarn.scheduler.fair.allow-undeclared-pools -
true

yarn.scheduler.fair.user-as-default-queue
true

yarn.scheduler.fair.preemption
true

yarn.scheduler.fair.preemption.cluster-utilization-threshold
0.8

On Sat, Feb 24, 2018 at 6:26 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> Fairscheduler in yarn provides you the possibility to use more resources
> than configured if they are available
>
> On 24. Feb 2018, at 13:47, akshay naidu <akshaynaid...@gmail.com> wrote:
>
> it sure is not able to get sufficient resources from YARN to start the
>> containers.
>>
> that's right. I worked when I reduced executors from thrift but it also
> reduced thrift's performance.
>
> But it is not the solution i am looking forward to. my sqoop import job
> runs just once a day, and thrift apps will running for 24/7 for
> fetching-processing-displaying online reports on website. reducing
> executors and keeping some in spare is helping in running more jobs other
> than thrift parallely but it's wasting the core when other jobs are not
> working.
>
> is there something which can help in allocating resources dynamically?
> which will automatically allocate maximum resources to thrift when there
> are no other jobs running, and automatically share resources with jobs/apps
> other than thrift.?
>
> I've heard of property in yarn - dynamicAlloction , can this help?
>
>
> Thanks.
>
> On Sat, Feb 24, 2018 at 7:14 AM, vijay.bvp <bvpsa...@gmail.com> wrote:
>
>> it sure is not able to get sufficient resources from YARN to start the
>> containers.
>> is it only with this import job or if you submit any other job its failing
>> to start.
>>
>> As a test just try to run another spark job or a mapredue job  and see if
>> the job can be started.
>>
>> Reduce the thrift server executors and see overall there is available
>> cluster capacity for new jobs.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: sqoop import job not working when spark thrift server is running.

2018-02-24 Thread akshay naidu
>
> it sure is not able to get sufficient resources from YARN to start the
> containers.
>
that's right. I worked when I reduced executors from thrift but it also
reduced thrift's performance.

But it is not the solution i am looking forward to. my sqoop import job
runs just once a day, and thrift apps will running for 24/7 for
fetching-processing-displaying online reports on website. reducing
executors and keeping some in spare is helping in running more jobs other
than thrift parallely but it's wasting the core when other jobs are not
working.

is there something which can help in allocating resources dynamically?
which will automatically allocate maximum resources to thrift when there
are no other jobs running, and automatically share resources with jobs/apps
other than thrift.?

I've heard of property in yarn - dynamicAlloction , can this help?


Thanks.

On Sat, Feb 24, 2018 at 7:14 AM, vijay.bvp  wrote:

> it sure is not able to get sufficient resources from YARN to start the
> containers.
> is it only with this import job or if you submit any other job its failing
> to start.
>
> As a test just try to run another spark job or a mapredue job  and see if
> the job can be started.
>
> Reduce the thrift server executors and see overall there is available
> cluster capacity for new jobs.
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: sqoop import job not working when spark thrift server is running.

2018-02-20 Thread akshay naidu
hello vijay,
appreciate your reply.

  what was the error when you are trying to run mapreduce import job when
> the
> thrift server is running.


it didnt throw any error, it just gets stuck at
INFO mapreduce.Job: Running job: job_151911053

and resumes the moment i kill Thrift .

thanks

On Tue, Feb 20, 2018 at 1:48 PM, vijay.bvp  wrote:

> what was the error when you are trying to run mapreduce import job when the
> thrift server is running.
> this is only config changed? what was the config before...
> also share the spark thrift server job config such as no of executors,
> cores
> memory etc.
>
> My guess is your mapreduce job is unable to get sufficient resources,
> container couldn't be launched and so failing to start, this could either
> because of non availability sufficient cores or RAM
>
> 9 worker nodes 12GB RAM each with 6 cores (max allowed cores 4 per
> container)
> you have to keep some room for operation system and other daemons.
>
> if thrift server is setup to have 11 executors with 3 cores each = 33 cores
> for workers and 1 for driver so 34 cores required for spark job and rest
> for
> any other jobs.
>
> spark driver and worker memory is ~9GB
> with 9 12 GB RAM worker nodes not sure how much you can allocate.
>
> thanks
> Vijay
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


sqoop import job not working when spark thrift server is running.

2018-02-19 Thread akshay naidu
Hello ,

I was trying to optimize my spark cluster. I did it to some extent by doing
some changes in yarn-site.xml and spark-defaults.conf file. before the
changes the mapreduce import job was running fine along with slow thrift
server.
after changes, i have to kill the thrift server to execute my sqoop import
job.

following are the configurations-

*yarn-site.xml*

yarn.nodemanager.resource.pcores-vcores-multiplier
1.0

yarn.nodemanager.vmem-pmem-ratio
5

yarn.nodemanager.resource.cpu-vcores
4

yarn.scheduler.maximum-allocation-vcores
4


*spark-defaults.conf*

spark.master   yarn
spark.driver.memory9g
spark.executor.memory  8570m
spark.yarn.executor.memoryOverhead 646m

spark.executor.instances   11
spark.executor.cores   3
spark.default.parallelism30

SPARK_WORKER_MEMORY 10g
SPARK_WORKER_INSTANCES 1
SPARK_WORKER_CORES 4

SPARK_DRIVER_MEMORY 9g
SPARK_DRIVER_CORES 3

SPARK_MASTER_PORT 7077

SPARK_EXECUTOR_INSTANCES 11
SPARK_EXECUTOR_CORES 3
SPARK_EXECUTOR_MEMORY 8570m


*Resources in cluster of 9 nodes are *
12GB RAM and 6 cores on each nodes.


Thanks for your time.


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
a small hint would be very helpful .

On Wed, Feb 14, 2018 at 5:17 PM, akshay naidu <akshaynaid...@gmail.com>
wrote:

> Hello Siva,
> Thanks for your reply.
>
> Actually i'm trying to generate online reports for my clients. For this I
> want the jobs should be executed faster without putting any job on QUEUE
> irrespective of the number of jobs different clients are executing from
> different locations.
> currently , a job processing 17GB of data takes more than 20mins to
> execute. also only 6 jobs run simultaneously and the remaining one are in
> WAITING stage.
>
> Thanks
>
> On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.s...@yahoo.com>
> wrote:
>
>>
>> Hello Akshay,
>>
>> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
>> Instance => 30 cores in total
>> Do you have any other pools confuted ? Running 8 jobs should be triggered
>> in parallel with the number of cores you have.
>>
>> For your long running job, did you have a chance to look at Tasks thats
>> being triggered.
>>
>> I would recommend slow running job to be configured in a separate pool.
>>
>> Regards
>> Shiv
>>
>> On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaid...@gmail.com>
>> wrote:
>>
>> 
>> **
>> yarn-site.xml
>>
>>
>>  
>> yarn.scheduler.fair.preemption.cluster-utilization-
>> threshold
>> 0.8
>>   
>>
>> 
>> yarn.scheduler.minimum-allocation-mb
>> 3584
>> 
>>
>> 
>> yarn.scheduler.maximum-allocation-mb
>> 10752
>> 
>>
>> 
>> yarn.nodemanager.resource.memory-mb
>> 10752
>>
>> 
>> **
>> spark-defaults.conf
>>
>> spark.master   yarn
>> spark.driver.memory9g
>> spark.executor.memory  1024m
>> spark.yarn.executor.memoryOverhead 1024m
>> spark.eventLog.enabled  true
>> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>>
>> spark.history.providerorg.apache.spark.deploy.histor
>> y.FsHistoryProvider
>> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
>> spark.history.fs.update.interval  10s
>> spark.history.ui.port 18080
>>
>> spark.ui.enabledtrue
>> spark.ui.port   4040
>> spark.ui.killEnabledtrue
>> spark.ui.retainedDeadExecutors  100
>>
>> spark.scheduler.modeFAIR
>> spark.scheduler.allocation.file /usr/local/spark/current/conf/
>> fairscheduler.xml
>>
>> #spark.submit.deployMode cluster
>> spark.default.parallelism30
>>
>> SPARK_WORKER_MEMORY 10g
>> SPARK_WORKER_INSTANCES 1
>> SPARK_WORKER_CORES 5
>>
>> SPARK_DRIVER_MEMORY 9g
>> SPARK_DRIVER_CORES 5
>>
>> SPARK_MASTER_IP Tech-master
>> SPARK_MASTER_PORT 7077
>>
>> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> I'm try to run multiple spark jobs on cluster running in yarn.
>>> Master is 24GB server with 6 Slaves of 12GB
>>>
>>> fairscheduler.xml settings are -
>>> 
>>> FAIR
>>> 10
>>> 2
>>> 
>>>
>>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>>> all.
>>> at a time only 7 of then runs simultaneously while the 8th one is in
>>> queue WAITING for a job to stop.
>>>
>>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>>> remaining three (maybe resources are not distributed properly) .
>>>
>>> I want to run n number of jobs at a time and make them run faster ,
>>> Right now, one job is taking more than three minutes while processing a max
>>> of 1GB data .
>>>
>>> Kindly assist me. what am I missing.
>>>
>>> Thanks.
>>>
>>
>>
>>
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
Hello Siva,
Thanks for your reply.

Actually i'm trying to generate online reports for my clients. For this I
want the jobs should be executed faster without putting any job on QUEUE
irrespective of the number of jobs different clients are executing from
different locations.
currently , a job processing 17GB of data takes more than 20mins to
execute. also only 6 jobs run simultaneously and the remaining one are in
WAITING stage.

Thanks

On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.s...@yahoo.com>
wrote:

>
> Hello Akshay,
>
> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
> Instance => 30 cores in total
> Do you have any other pools confuted ? Running 8 jobs should be triggered
> in parallel with the number of cores you have.
>
> For your long running job, did you have a chance to look at Tasks thats
> being triggered.
>
> I would recommend slow running job to be configured in a separate pool.
>
> Regards
> Shiv
>
> On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaid...@gmail.com> wrote:
>
> 
> **
> yarn-site.xml
>
>
>  
> yarn.scheduler.fair.preemption.cluster-
> utilization-threshold
> 0.8
>   
>
> 
> yarn.scheduler.minimum-allocation-mb
> 3584
> 
>
> 
> yarn.scheduler.maximum-allocation-mb
> 10752
> 
>
> 
> yarn.nodemanager.resource.memory-mb
> 10752
>
> 
> **
> spark-defaults.conf
>
> spark.master   yarn
> spark.driver.memory9g
> spark.executor.memory  1024m
> spark.yarn.executor.memoryOverhead 1024m
> spark.eventLog.enabled  true
> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>
> spark.history.providerorg.apache.spark.deploy.
> history.FsHistoryProvider
> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
> spark.history.fs.update.interval  10s
> spark.history.ui.port 18080
>
> spark.ui.enabledtrue
> spark.ui.port   4040
> spark.ui.killEnabledtrue
> spark.ui.retainedDeadExecutors  100
>
> spark.scheduler.modeFAIR
> spark.scheduler.allocation.file /usr/local/spark/current/conf/
> fairscheduler.xml
>
> #spark.submit.deployMode cluster
> spark.default.parallelism    30
>
> SPARK_WORKER_MEMORY 10g
> SPARK_WORKER_INSTANCES 1
> SPARK_WORKER_CORES 5
>
> SPARK_DRIVER_MEMORY 9g
> SPARK_DRIVER_CORES 5
>
> SPARK_MASTER_IP Tech-master
> SPARK_MASTER_PORT 7077
>
> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com>
> wrote:
>
>> Hello,
>> I'm try to run multiple spark jobs on cluster running in yarn.
>> Master is 24GB server with 6 Slaves of 12GB
>>
>> fairscheduler.xml settings are -
>> 
>> FAIR
>> 10
>> 2
>> 
>>
>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>> all.
>> at a time only 7 of then runs simultaneously while the 8th one is in
>> queue WAITING for a job to stop.
>>
>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>> remaining three (maybe resources are not distributed properly) .
>>
>> I want to run n number of jobs at a time and make them run faster , Right
>> now, one job is taking more than three minutes while processing a max of
>> 1GB data .
>>
>> Kindly assist me. what am I missing.
>>
>> Thanks.
>>
>
>
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
**
yarn-site.xml


 

yarn.scheduler.fair.preemption.cluster-utilization-threshold
0.8
  


yarn.scheduler.minimum-allocation-mb
3584



yarn.scheduler.maximum-allocation-mb
10752



yarn.nodemanager.resource.memory-mb
10752

**
spark-defaults.conf

spark.master   yarn
spark.driver.memory9g
spark.executor.memory  1024m
spark.yarn.executor.memoryOverhead 1024m
spark.eventLog.enabled  true
spark.eventLog.dir hdfs://tech-master:54310/spark-logs

spark.history.provider
org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
spark.history.fs.update.interval  10s
spark.history.ui.port 18080

spark.ui.enabledtrue
spark.ui.port   4040
spark.ui.killEnabledtrue
spark.ui.retainedDeadExecutors  100

spark.scheduler.modeFAIR
spark.scheduler.allocation.file
/usr/local/spark/current/conf/fairscheduler.xml

#spark.submit.deployMode cluster
spark.default.parallelism30

SPARK_WORKER_MEMORY 10g
SPARK_WORKER_INSTANCES 1
SPARK_WORKER_CORES 5

SPARK_DRIVER_MEMORY 9g
SPARK_DRIVER_CORES 5

SPARK_MASTER_IP Tech-master
SPARK_MASTER_PORT 7077

On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com>
wrote:

> Hello,
> I'm try to run multiple spark jobs on cluster running in yarn.
> Master is 24GB server with 6 Slaves of 12GB
>
> fairscheduler.xml settings are -
> 
> FAIR
> 10
> 2
> 
>
> I am running 8 jobs simultaneously , jobs are running parallelly but not
> all.
> at a time only 7 of then runs simultaneously while the 8th one is in queue
> WAITING for a job to stop.
>
> also, out of the 7 running jobs, 4 runs comparatively much faster than
> remaining three (maybe resources are not distributed properly) .
>
> I want to run n number of jobs at a time and make them run faster , Right
> now, one job is taking more than three minutes while processing a max of
> 1GB data .
>
> Kindly assist me. what am I missing.
>
> Thanks.
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com>
wrote:

> Hello,
> I'm try to run multiple spark jobs on cluster running in yarn.
> Master is 24GB server with 6 Slaves of 12GB
>
> fairscheduler.xml settings are -
> 
> FAIR
> 10
> 2
> 
>
> I am running 8 jobs simultaneously , jobs are running parallelly but not
> all.
> at a time only 7 of then runs simultaneously while the 8th one is in queue
> WAITING for a job to stop.
>
> also, out of the 7 running jobs, 4 runs comparatively much faster than
> remaining three (maybe resources are not distributed properly) .
>
> I want to run n number of jobs at a time and make them run faster , Right
> now, one job is taking more than three minutes while processing a max of
> 1GB data .
>
> Kindly assist me. what am I missing.
>
> Thanks.
>


Run Multiple Spark jobs. Reduce Execution time.

2018-02-13 Thread akshay naidu
Hello,
I'm try to run multiple spark jobs on cluster running in yarn.
Master is 24GB server with 6 Slaves of 12GB

fairscheduler.xml settings are -

FAIR
10
2


I am running 8 jobs simultaneously , jobs are running parallelly but not
all.
at a time only 7 of then runs simultaneously while the 8th one is in queue
WAITING for a job to stop.

also, out of the 7 running jobs, 4 runs comparatively much faster than
remaining three (maybe resources are not distributed properly) .

I want to run n number of jobs at a time and make them run faster , Right
now, one job is taking more than three minutes while processing a max of
1GB data .

Kindly assist me. what am I missing.

Thanks.


Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread akshay naidu
yes , spark download page does mention that 2.2.1 is for 'hadoop-2.7 and
later', but my confusion is because spark was released on 1st dec and
hadoop-3 stable version released on 13th Dec. And  to my similar question
on stackoverflow.com
<https://stackoverflow.com/questions/47920005/how-is-hadoop-3-0-0-s-compatibility-with-older-versions-of-hive-pig-sqoop-and>
, Mr. jacek-laskowski
<https://stackoverflow.com/users/1305344/jacek-laskowski> replied that
spark-2.2.1 doesn't support hadoop-3. so I am just looking for more clarity
on this doubt before moving on to upgrades.

Thanks all for help.
Akshay.

On Mon, Jan 8, 2018 at 8:47 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> AFAIK, there's no large scale test for Hadoop 3.0 in the community. So it
> is not clear whether it is supported or not (or has some issues). I think
> in the download page "Pre-Built for Apache Hadoop 2.7 and later" mostly
> means that it supports Hadoop 2.7+ (2.8...), but not 3.0 (IIUC).
>
> Thanks
> Jerry
>
> 2018-01-08 4:50 GMT+08:00 Raj Adyanthaya <raj...@gmail.com>:
>
>> Hi Akshay
>>
>> On the Spark Download page when you select Spark 2.2.1 it gives you an
>> option to select package type. In that, there is an option to select
>> "Pre-Built for Apache Hadoop 2.7 and later". I am assuming it means that it
>> does support Hadoop 3.0.
>>
>> http://spark.apache.org/downloads.html
>>
>> Thanks,
>> Raj A.
>>
>> On Sat, Jan 6, 2018 at 8:23 PM, akshay naidu <akshaynaid...@gmail.com>
>> wrote:
>>
>>> hello Users,
>>> I need to know whether we can run latest spark on  latest hadoop version
>>> i.e., spark-2.2.1 released on 1st dec and hadoop-3.0.0 released on 13th dec.
>>> thanks.
>>>
>>
>>
>


Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-06 Thread akshay naidu
hello Users,
I need to know whether we can run latest spark on  latest hadoop version
i.e., spark-2.2.1 released on 1st dec and hadoop-3.0.0 released on 13th dec.
thanks.