Re: Crail-Spark Shuffle Manager config error

Adrian Schuepbach Wed, 19 Jun 2019 05:29:35 -0700

Hi David

I changed the code to use the new API to create the Crail configuration.
Please pull, build and install the newest version.


Please also remove the old jars from the directory where the classpath
is pointing to, since if you have multiple jars of different versions
in the classpath, it is unclear, which one will be taken.

Best regards
Adrian

On 6/19/19 13:43, Adrian Schuepbach wrote:
> Hi David
>
> This is caused by the API change to create a Crail configuration object.
> The new API has three different static methods to create the Crail
> configuration instead of the empty constructor.
>
> I am adapting the dependent repositories to the new API.
>
> What is a bit unclear to me is why you hit this. The crail-dispatcher's
> dependency is to crail-client 1.0, however the new API is only available
> on the current master (version 1.2-incubating-SNAPSHOT).
>
> If you built Apache Crail from source, you get 1.2-incubating-SNAPSHOT,
> but not the 1.0 version. I would have expected that you cannot even build
> crail-spark-io.
>
> In any case, the fix is shortly ready.
>
> Regards
> Adrian
>
> On 6/19/19 09:21, Jonas Pfefferle wrote:
>> Hi David,
>>
>>
>> I assume you are running with latest Crail master. We just pushed a
>> change to the CrailConfiguration initialization which we have not
>> adapted yet in the shuffle plugin (Should be a one line fix). @Adrian
>> Can you take a look.
>>
>> Regards,
>> Jonas
>>
>>  On Tue, 18 Jun 2019 23:24:48 +0000
>>  David Crespi <[email protected]> wrote:
>>> Hi,
>>> I’m getting what looks to be a configuration error when trying to use
>>> the CrailShuffleManager.
>>> (spark.shuffle.manager           
>>> org.apache.spark.shuffle.crail.CrailShuffleManager)
>>>
>>> It seems like a basic error, but other things are running okay until
>>> I add in the line above in to my spark-defaults.conf
>>> File.
>>> I have my environment variable for crail home set, as well as for the
>>> disni libs using:
>>> LD_LIBRARY_PATH=/usr/local/lib
>>> $ ls -l /usr/local/lib/
>>> total 156
>>> -rwxr-xr-x 1 root root       947 Jun 18 08:11 libdisni.la
>>> lrwxrwxrwx 1 root root      17 Jun 18 08:11 libdisni.so ->
>>> libdisni.so.0.0.0
>>> lrwxrwxrwx 1 root root      17 Jun 18 08:11 libdisni.so.0 ->
>>> libdisni.so.0.0.0
>>> -rwxr-xr-x 1 root root  149784 Jun 18 08:11 libdisni.so.0.0.0
>>>
>>> I also have a environment variable for classpath set:
>>> CLASSPATH=/disni/target/*:/jNVMf/target/*:/crail/jars/*
>>>
>>> Could the classpath veriable be the issue?
>>>
>>> 19/06/18 15:59:47 DEBUG Client: getting client out of cache:
>>> org.apache.hadoop.ipc.Client@7bebcd65
>>> 19/06/18 15:59:47 DEBUG PerformanceAdvisory: Both short-circuit local
>>> reads and UNIX domain socket are disabled.
>>> 19/06/18 15:59:47 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>> not using SaslPropertiesResolver, no QOP found in configuration for
>>> dfs.data.transfer.protection
>>> 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0 stored as
>>> values in memory (estimated size 288.9 KB, free 366.0 MB)
>>> 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0 locally
>>> took  123 ms
>>> 19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0
>>> without replication took  125 ms
>>> 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0_piece0 stored
>>> as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>>> 19/06/18 15:59:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>> memory on master:34103 (size: 23.8 KB, free: 366.3 MB)
>>> 19/06/18 15:59:48 DEBUG BlockManagerMaster: Updated info of block
>>> broadcast_0_piece0
>>> 19/06/18 15:59:48 DEBUG BlockManager: Told master about block
>>> broadcast_0_piece0
>>> 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0_piece0
>>> locally took  7 ms
>>> 19/06/18 15:59:48 DEBUG BlockManager: Putting block
>>> broadcast_0_piece0 without replication took  8 ms
>>> 19/06/18 15:59:48 INFO SparkContext: Created broadcast 0 from
>>> newAPIHadoopFile at TeraSort.scala:60
>>> 19/06/18 15:59:48 DEBUG Client: The ping interval is 60000 ms.
>>> 19/06/18 15:59:48 DEBUG Client: Connecting to
>>> NameNode-1/192.168.3.7:54310
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser: starting, having connections 1
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser sending #0
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser got value #0
>>> 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getFileInfo took 56ms
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser sending #1
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser got value #1
>>> 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getListing took 3ms
>>> 19/06/18 15:59:48 DEBUG FileInputFormat: Time taken to get
>>> FileStatuses: 142
>>> 19/06/18 15:59:48 INFO FileInputFormat: Total input paths to process : 2
>>> 19/06/18 15:59:48 DEBUG FileInputFormat: Total # of splits generated
>>> by getSplits: 2, TimeTaken: 145
>>> 19/06/18 15:59:48 DEBUG FileCommitProtocol: Creating committer
>>> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>> output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>>> 19/06/18 15:59:48 DEBUG FileCommitProtocol: Using (String, String,
>>> Boolean) constructor
>>> 19/06/18 15:59:48 INFO FileOutputCommitter: File Output Committer
>>> Algorithm version is 1
>>> 19/06/18 15:59:48 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>> masked=rwxr-xr-x
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser sending #2
>>> 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
>>> NameNode-1/192.168.3.7:54310 from hduser got value #2
>>> 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>>> 19/06/18 15:59:48 DEBUG ClosureCleaner: Cleaning lambda:
>>> $anonfun$write$1
>>> 19/06/18 15:59:48 DEBUG ClosureCleaner:  +++ Lambda closure
>>> ($anonfun$write$1) is now cleaned +++
>>> 19/06/18 15:59:48 INFO SparkContext: Starting job: runJob at
>>> SparkHadoopWriter.scala:78
>>> 19/06/18 15:59:48 INFO CrailDispatcher: CrailStore starting version 400
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteonclose false
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteOnStart true
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.preallocate 0
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.writeAhead 0
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.debug false
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.serializer
>>> org.apache.spark.serializer.CrailSparkSerializer
>>> 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>> true
>>> 19/06/18 15:59:48 INFO CrailDispatcher:
>>> spark.crail.shuffle.outstanding 1
>>> 19/06/18 15:59:48 INFO CrailDispatcher:
>>> spark.crail.shuffle.storageclass 0
>>> 19/06/18 15:59:48 INFO CrailDispatcher:
>>> spark.crail.broadcast.storageclass 0
>>> Exception in thread "dag-scheduler-event-loop"
>>> java.lang.IllegalAccessError: tried to access method
>>> org.apache.crail.conf.CrailConfiguration.<init>()V from class
>>> org.apache.spark.storage.CrailDispatcher
>>>        at
>>> org.apache.spark.storage.CrailDispatcher.org$apache$spark$storage$CrailDispatcher$$init(CrailDispatcher.scala:119)
>>>        at
>>> org.apache.spark.storage.CrailDispatcher$.get(CrailDispatcher.scala:662)
>>>        at
>>> org.apache.spark.shuffle.crail.CrailShuffleManager.registerShuffle(CrailShuffleManager.scala:52)
>>>        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94)
>>>        at
>>> org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)
>>>        at
>>> org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:240)
>>>        at scala.Option.getOrElse(Option.scala:138)
>>>        at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238)
>>>        at
>>> org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512)
>>>        at
>>> org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461)
>>>        at
>>> org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448)
>>>        at
>>> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962)
>>>        at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067)
>>>        at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
>>>        at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
>>>        at
>>> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>>>
>>> Regards,
>>>
>>>           David
>>>
>>>
>>
-- 
Adrian Schüpbach, Dr. sc. ETH Zürich

Re: Crail-Spark Shuffle Manager config error

Reply via email to