Urgent : Changes required in the archive

2019-09-25 Thread Vishal Verma
Hi All,

The help request sent by me on 20th Dec 2017 to apache spark developers
need to be edited as it contains our clients sensitive information about
the infrastructure and user Id. I understand the policies of ASF and regret
the mistake made earlier. The mistake has been highlighted to the client as
well and this might lead to repercussions for the company and me in
particular seeing the support to help correct my mistake.
Link to the prior content of the mail/archive :
http://mail-archives.apache.org/mod_mbox/spark-dev/201712.mbox/%3CCAJAL6iof9=chlba+7pvovftcrkp0_7d3bcb8w05vg6tqsk_...@mail.gmail.com%3E


You can replace the following content of the error log with the earlier as
I have* replaced *all required information from the content, keeping the
error as it is.


> 17/12/20 11:07:16 INFO executor.CoarseGrainedExecutorBackend: Started
> daemon with process name: 19581@*localhost*
> 17/12/20 11:07:16 INFO util.SignalUtils: Registered signal handler for TERM
> 17/12/20 11:07:16 INFO util.SignalUtils: Registered signal handler for HUP
> 17/12/20 11:07:16 INFO util.SignalUtils: Registered signal handler for INT
> 17/12/20 11:07:16 INFO spark.SecurityManager: Changing view acls to: yarn,
> *user-me*
> 17/12/20 11:07:16 INFO spark.SecurityManager: Changing modify acls to:
> yarn,*user-me*
> 17/12/20 11:07:16 INFO spark.SecurityManager: Changing view acls groups to:
> 17/12/20 11:07:16 INFO spark.SecurityManager: Changing modify acls groups
> to:
> 17/12/20 11:07:16 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users  with view permissions:
> Set(yarn, user-me); groups with view permissions: Set(); users with modify
> permissions: Set(yarn, user-me); groups with modify permissions: Set()
> 17/12/20 11:07:16 INFO client.TransportClientFactory: Successfully created
> connection to *localhost*:35617 after 48 ms (0 ms spent in bootstraps)
> 17/12/20 11:07:17 INFO spark.SecurityManager: Changing view acls to:
> yarn,user-me
> 17/12/20 11:07:17 INFO spark.SecurityManager: Changing modify acls to:
> yarn,user-me
> 17/12/20 11:07:17 INFO spark.SecurityManager: Changing view acls groups to:
> 17/12/20 11:07:17 INFO spark.SecurityManager: Changing modify acls groups
> to:
> 17/12/20 11:07:17 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users  with view permissions:
> Set(yarn, user-me); groups with view permissions: Set(); users with modify
> permissions: Set(yarn, user-me); groups with modify permissions: Set()
> 17/12/20 11:07:17 INFO client.TransportClientFactory: Successfully created
> connection to 127.0.0.1:35617 after 1 ms (0 ms spent in bootstraps)
> 17/12/20 11:07:17 INFO storage.DiskBlockManager: Created local directory
> at
> /hadoop/yarn/nm-local-dir/usercache/user-me/appcache/application_1512677738429_16167/blockmgr-d585ecec-829a-432b-a8f1-89503359510e
> 17/12/20 11:07:17 INFO memory.MemoryStore: MemoryStore started with
> capacity 7.8 GB
> 17/12/20 11:07:17 INFO executor.CoarseGrainedExecutorBackend: Connecting
> to driver: spark://CoarseGrainedScheduler@*localhost*:35617
> 17/12/20 11:07:17 INFO executor.CoarseGrainedExecutorBackend: Successfully
> registered with driver
> 17/12/20 11:07:17 INFO executor.Executor: Starting executor ID 1 on host
> localhost
> 17/12/20 11:07:17 INFO util.Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60054.
> 17/12/20 11:07:17 INFO netty.NettyBlockTransferService: Server created on
> *localhost*:60054
> 17/12/20 11:07:17 INFO storage.BlockManager: Using
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
> policy
> 17/12/20 11:07:17 INFO storage.BlockManagerMaster: Registering
> BlockManager BlockManagerId(1, *localhost*, 60054, None)
> 17/12/20 11:07:17 INFO storage.BlockManagerMaster: Registered BlockManager
> BlockManagerId(1, *localhost*, 60054, None)
> 17/12/20 11:07:17 INFO storage.BlockManager: external shuffle service port
> = 7337
> 17/12/20 11:07:17 INFO storage.BlockManager: Registering executor with
> local external shuffle service.
> 17/12/20 11:07:17 INFO client.TransportClientFactory: Successfully created
> connection to *localhost/127.0.0.1 :*7337 after 1 ms (0
> ms spent in bootstraps)
> 17/12/20 11:07:17 INFO storage.BlockManager: Initialized BlockManager:
> BlockManagerId(1, *localhost*, 60054, None)
> 17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL TERM
> 17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
> 17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called



I have gone through the policies for the archives and understand the
efforts to change the archives while keeping the integrity.
I will be very thankful to you, if you understand the problem I am going
though and help me in correcting my mistake.

-- 
Thanks & Regards,

*Vishal Verma*
*BigData Engineer* | Exadatum Software Services Pvt

Standalone Spark, How to find (driver's ) final status for an application

2019-09-25 Thread Nilkanth Patel
I am setting up *Spark 2.2.0 in standalone mode* (
https://spark.apache.org/docs/latest/spark-standalone.html) and submitting
spark jobs programatically using

 SparkLauncher sparkAppLauncher = new
SparkLauncher(userNameMap).setMaster(sparkMaster).setAppName(appName).;
 SparkAppHandle sparkAppHandle = sparkAppLauncher.startApplication();

 I do have* java client program that polls Job status for the jobs
submitted programatically*, for which i am using following REST endpoint.
 curl  http://192.168.1.139:8080/json/ which provide JSON response as
following,

{
  "url" : "spark://192.168.1.139:7077",
  "workers" : [ { "id" : "x", "host" : "x", "port" : x, "webuiaddress" : "x",
  "cores" : x,  "coresused" : x, "coresfree" : x,
"memory" : xx,
  "memoryused" : xx,  "memoryfree" : xx,  "state" :
"x", "lastheartbeat" : x
}, { ...},  ],
  "cores" : x,
  "coresused" : x,
  "memory" : x,
  "memoryused" : x,
  "activeapps" : [ ],
  "completedapps" : [ { "starttime" : x, "id" : "app-xx-", "name"
: "abc", "user" : "xx",
 "memoryperslave" : x, "submitdate" :
"x","state" : "FINISHED OR RUNNING", "duration" : x
  }, {...}],
  "activedrivers" : [ ],
  "status" : "x"}


In above response, I have observed state for *completedapps is always
FINISHED even if application fails*, while on UI (http://master:8080),
associated driver shows a failed state, like.

[image: image.png]

[image: image.png]

Referring to above example, Currently, My java client gets status  for
application (app-20190925115750-0003
) FINISHED,
even though it got failed (encountered exception) and associated driver
shows "FAILED" state.* I intend to show the final status in this case as
FAILED.*
It seems if i can co-relate, an application-id (app-20190925115750-0003
) to driver-id
(driver-20190925115748-0003), I can report a "FAILED" (final) status. I
could not find any co-relation between them (appID --> driver ID).

*Looking forward to your suggestions to resolving this or any possible
approaches to achieve this.* I have also come across some hidden REST APIs
like
http://xx.xx.xx.xx:6066/v1/submissions/status/driver-20190925115748-0003, which
seems have a limited info returned in response.


Thanks in advance.
Nilkanth.


Re: intermittent Kryo serialization failures in Spark

2019-09-25 Thread Jerry Vinokurov
Hi Julien,

Thanks for the suggestion. If we don't do a broadcast, that would
presumably affect the performance of the job, as the model that is failing
to be broadcast is something that we need to be shared across the cluster.
But it may be worth it if the trade-off is not having things run properly.
Vadim's suggestions did not make a difference for me (still hitting this
error several times a day) but I'll try with disabling broadcast and see if
that does anything.

thanks,
Jerry

On Fri, Sep 20, 2019 at 10:00 AM Julien Laurenceau <
julien.laurenc...@pepitedata.com> wrote:

> Hi,
> Did you try without the broadcast ?
> Regards
> JL
>
> Le jeu. 19 sept. 2019 à 06:41, Vadim Semenov 
> a écrit :
>
>> Pre-register your classes:
>>
>> ```
>> import com.esotericsoftware.kryo.Kryo
>> import org.apache.spark.serializer.KryoRegistrator
>>
>> class MyKryoRegistrator extends KryoRegistrator {
>>   override def registerClasses(kryo: Kryo): Unit = {
>> kryo.register(Class.forName("[[B")) // byte[][]
>> kryo.register(classOf[java.lang.Class[_]])
>>   }
>> }
>> ```
>>
>> then run with
>>
>> 'spark.kryo.referenceTracking': 'false',
>> 'spark.kryo.registrationRequired': 'false',
>> 'spark.kryo.registrator': 'com.datadog.spark.MyKryoRegistrator',
>> 'spark.kryo.unsafe': 'false',
>> 'spark.kryoserializer.buffer.max': '256m',
>>
>> On Tue, Sep 17, 2019 at 10:38 AM Jerry Vinokurov 
>> wrote:
>>
>>> Hi folks,
>>>
>>> Posted this some time ago but the problem continues to bedevil us. I'm
>>> including a (slightly edited) stack trace that results from this error. If
>>> anyone can shed any light on what exactly is happening here and what we can
>>> do to avoid it, that would be much appreciated.
>>>
>>> org.apache.spark.SparkException: Failed to register classes with Kryo
at 
 org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:140)
at 
 org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:324)
at 
 org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:309)
at 
 org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:218)
at 
 org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:288)
at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:127)
at 
 org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1489)
at 
 org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:103)
at 
 org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:129)
at 
 org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:165)
at 
 org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:309)
at 
 org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:305)
at 
 org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:327)
at 
 org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
at 
 org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at 
 org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
 org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
 org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
at 
 org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
at 
 org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
at 
 org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)