Too Many Accumulators in my Spark Job

2016-03-12 Thread Harshvardhan Chauhan

Hi,
My question is about having a lot of counters in spark to keep track of bad/null
values in my rdd its descried in detail in below stackoverflow link
http://stackoverflow.com/questions/35953400/too-many-accumulators-in-spark-job
Posting to the user group to get more traction. Appreciate your help!!

Thanks, Harshvardhan Chauhan Software Engineer |
GumGum | Ads that stick
310-260-9666 | ha...@gumgum.com

Escaping tabs and newlines not working

2016-01-27 Thread Harshvardhan Chauhan
Hi,
Escaping newline and tad dosent seem to work for me. Spark version 1.5.2 on emr
reading files from s3
here is more details about my issue


Scala escaping newline and tab characters I am trying to use the following code 
to get rid of tab and newline characters
in the url but I still get newline and… stackoverflow.com

Harshvardhan Chauhan | Software Engineer
GumGum | Ads that stick
310-260-9666 | ha...@gumgum.com

unsubscribe

2015-07-28 Thread Harshvardhan Chauhan
-- 
*Harshvardhan Chauhan*  |  Software Engineer
*GumGum* http://www.gumgum.com/  |  *Ads that stick*
310-260-9666  |  ha...@gumgum.com


Re: S3 SubFolder Write Issues

2015-03-12 Thread Harshvardhan Chauhan
I use s3n://BucketName/SomeFoler/OutputFolder   and it works for my app.

On Wed, Mar 11, 2015 at 12:14 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Does it write anything in BUCKET/SUB_FOLDER/output?

 Thanks
 Best Regards

 On Wed, Mar 11, 2015 at 10:15 AM, cpalm3 cpa...@gmail.com wrote:

 Hi All,

 I am hoping someone has seen this issue before with S3, as I haven't been
 able to find a solution for this problem.

 When I try to save as Text file to s3 into a subfolder, it only ever
 writes
 out to the bucket level folder
 and produces block level generated file names and not my output folder as
 I
 specified.
 Below is the sample code in Scala, I have also seen this behavior in the
 Java code.

  val out =  inputRdd.map {ir = mapFunction(ir)}.groupByKey().mapValues {
 x
 = mapValuesFunction(x) }
.saveAsTextFile(s3://BUCKET/SUB_FOLDER/output

 Any ideas on how to get saveAsTextFile to write to an S3 subfolder?

 Thanks,
 Chris



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/S3-SubFolder-Write-Issues-tp21997.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
*Harshvardhan Chauhan*  |  Software Engineer
*GumGum* http://www.gumgum.com/  |  *Ads that stick*
310-260-9666  |  ha...@gumgum.com


Re: issue Running Spark Job on Yarn Cluster

2015-02-19 Thread Harshvardhan Chauhan
Is this the full stack trace ?

On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh sachin.sha...@gmail.com
wrote:

 Hi,
 I want to run my spark Job in Hadoop yarn Cluster mode,
 I am using below command -
 spark-submit --master yarn-cluster --driver-memory 1g --executor-memory 1g
 --executor-cores 1 --class com.dc.analysis.jobs.AggregationJob
 sparkanalitic.jar param1 param2 param3
 I am getting error as under, kindly suggest whats going wrong ,is command
 is
 proper or not ,thanks in advance,

 Exception in thread main org.apache.spark.SparkException: Application
 finished with failed status
 at
 org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509)
 at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
 at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
 at org.apache.spark.deploy.yarn.Client.main(Client.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/issue-Running-Spark-Job-on-Yarn-Cluster-tp21697.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
*Harshvardhan Chauhan*  |  Software Engineer
*GumGum* http://www.gumgum.com/  |  *Ads that stick*
310-260-9666  |  ha...@gumgum.com


Re: OOM error

2015-02-17 Thread Harshvardhan Chauhan
Thanks for the pointer it led me to
http://spark.apache.org/docs/1.2.0/tuning.html increasing parallelism
resolved the issue.



On Mon, Feb 16, 2015 at 11:57 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Increase your executor memory, Also you can play around with increasing
 the number of partitions/parallelism etc.

 Thanks
 Best Regards

 On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com
 wrote:

 Hi All,


 I need some help with Out Of Memory errors in my application. I am using
 Spark 1.1.0 and my application is using Java API. I am running my app on
 EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
 sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
 120 executors and executor memory is 2G.

 These are various errors i see the in my logs.

 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 
 dropped from memory (free 257277829)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x6e0138a3, 
 /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, 
 /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0xd4211985, 
 /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer

OOM error

2015-02-16 Thread Harshvardhan Chauhan
)
java.lang.Thread.run(Thread.java:745)



15/02/16 10:57:26 INFO
remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.
Exception in thread Driver java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
Caused by: org.apache.spark.SparkException: Job cancelled because
SparkContext was shut down
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399)
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
at akka.actor.ActorCell.terminate(ActorCell.scala:338)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
at akka.dispatch.Mailbox.run(Mailbox.scala:218)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


-- 
*Harshvardhan Chauhan*  |  Software Engineer
*GumGum* http://www.gumgum.com/  |  *Ads that stick*
310-260-9666  |  ha...@gumgum.com