Re: on shark, is tachyon less efficient than memory_only cache strategy ?
hi, haoyuan, thanks for replying. 2014-07-21 16:29 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Qingyang, Aha. Got it. 800MB data is pretty small. Loading from Tachyon does have a bit of extra overhead. But it will have more benefit when the data size is larger. Also, if you store the table in Tachyon, you can have different shark servers to query the data at the same time. For more trade-off, please refer to this page: http://tachyon-project.org/Running-Shark-on-Tachyon.html Best, Haoyuan On Wed, Jul 16, 2014 at 12:06 AM, qingyang li liqingyang1...@gmail.com wrote: let's me describe my scene: -- i have 8 machines (24 core , 16G memory, per machine) of spark cluster and tachyon cluster. On tachyon, I create one table which contains 800M data, when i run query sql on shark, it will cost 2.43s, but when i create the same table on spark memory , i run the same sql , it will cost 1.56s. data on tachyon cost more time than data on spark memory. they all have 150 map process, and per node 16-20 map process. I think the reason is that when data is on tachyon, shark will let spark slave load data from tachyon salve which is on the same node with tachyon slave, i have tried to set some configuration to tune shark and tachyon, but still can not make the former more fast than 2.43s. do anyone have some ideas ? By the way , my tachyon block size is 1GB now, i want to reset block size , will it work by setting tachyon.user.default.block.size.byte=8M ? if not, what does tachyon.user.default.block.size.byte mean? 2014-07-14 13:13 GMT+08:00 qingyang li liqingyang1...@gmail.com: Shark, thanks for replying. Let's me clear my question again. -- i create a table using create table xxx1 tblproperties(shark.cache=tachyon) as select * from xxx2 when excuting some sql (for example , select * from xxx1) using shark, shark will read data into shark's memory from tachyon's memory. I think if each time we execute sql, shark always load data from tachyon, it is less effient. could we use some cache policy (such as, CacheAllPolicy FIFOCachePolicy LRUCachePolicy ) to cache data to invoid reading data from tachyon for each sql query? -- 2014-07-14 2:47 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Qingyang, Are you asking Spark or Shark (The first email was Shark, the last email was Spark.)? Best, Haoyuan On Wed, Jul 9, 2014 at 7:40 PM, qingyang li liqingyang1...@gmail.com wrote: could i set some cache policy to let spark load data from tachyon only one time for all sql query? for example by using CacheAllPolicy FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they are not useful. I think , if spark always load data for each sql query, it will impact the query speed , it will take more time than the case that data are managed by spark itself. 2014-07-09 1:19 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Yes. For Shark, two modes, shark.cache=tachyon and shark.cache=memory, have the same ser/de overhead. Shark loads data from outsize of the process in Tachyon mode with the following benefits: - In-memory data sharing across multiple Shark instances (i.e. stronger isolation) - Instant recovery of in-memory tables - Reduce heap size = faster GC in shark - If the table is larger than the memory size, only the hot columns will be cached in memory from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon Haoyuan On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson ilike...@gmail.com wrote: Shark's in-memory format is already serialized (it's compressed and column-based). On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan mri...@gmail.com wrote: You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from Tachyon; we can directly read from the buffers in the same way that Shark reads from its in-memory columnar format. On Tue, Jul 8, 2014 at 1:18 AM, qingyang li liqingyang1...@gmail.com wrote: hi, when i create a table, i can point the cache strategy using shark.cache, i think shark.cache
Re: on shark, is tachyon less efficient than memory_only cache strategy ?
Shark, thanks for replying. Let's me clear my question again. -- i create a table using create table xxx1 tblproperties(shark.cache=tachyon) as select * from xxx2 when excuting some sql (for example , select * from xxx1) using shark, shark will read data into shark's memory from tachyon's memory. I think if each time we execute sql, shark always load data from tachyon, it is less effient. could we use some cache policy (such as, CacheAllPolicy FIFOCachePolicy LRUCachePolicy ) to cache data to invoid reading data from tachyon for each sql query? -- 2014-07-14 2:47 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Qingyang, Are you asking Spark or Shark (The first email was Shark, the last email was Spark.)? Best, Haoyuan On Wed, Jul 9, 2014 at 7:40 PM, qingyang li liqingyang1...@gmail.com wrote: could i set some cache policy to let spark load data from tachyon only one time for all sql query? for example by using CacheAllPolicy FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they are not useful. I think , if spark always load data for each sql query, it will impact the query speed , it will take more time than the case that data are managed by spark itself. 2014-07-09 1:19 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Yes. For Shark, two modes, shark.cache=tachyon and shark.cache=memory, have the same ser/de overhead. Shark loads data from outsize of the process in Tachyon mode with the following benefits: - In-memory data sharing across multiple Shark instances (i.e. stronger isolation) - Instant recovery of in-memory tables - Reduce heap size = faster GC in shark - If the table is larger than the memory size, only the hot columns will be cached in memory from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon Haoyuan On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson ilike...@gmail.com wrote: Shark's in-memory format is already serialized (it's compressed and column-based). On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan mri...@gmail.com wrote: You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from Tachyon; we can directly read from the buffers in the same way that Shark reads from its in-memory columnar format. On Tue, Jul 8, 2014 at 1:18 AM, qingyang li liqingyang1...@gmail.com wrote: hi, when i create a table, i can point the cache strategy using shark.cache, i think shark.cache=memory_only means data are managed by spark, and data are in the same jvm with excutor; while shark.cache=tachyon means data are managed by tachyon which is off heap, and data are not in the same jvm with excutor, so spark will load data from tachyon for each query sql , so, is tachyon less efficient than memory_only cache strategy ? if yes, can we let spark load all data once from tachyon for all sql query if i want to use tachyon cache strategy since tachyon is more HA than memory_only ? -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/ -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/
when insert data into one table which is on tachyon, how can i control the data position?
when insert data (the data is small, it will not be partitioned automatically)into one table which is on tachyon, how can i control the data position, i mean how can i point which machine the data should exist on? if we can not control, what is the data assign strategy of tachyon or spark?
Re: on shark, is tachyon less efficient than memory_only cache strategy ?
could i set some cache policy to let spark load data from tachyon only one time for all sql query? for example by using CacheAllPolicy FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they are not useful. I think , if spark always load data for each sql query, it will impact the query speed , it will take more time than the case that data are managed by spark itself. 2014-07-09 1:19 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Yes. For Shark, two modes, shark.cache=tachyon and shark.cache=memory, have the same ser/de overhead. Shark loads data from outsize of the process in Tachyon mode with the following benefits: - In-memory data sharing across multiple Shark instances (i.e. stronger isolation) - Instant recovery of in-memory tables - Reduce heap size = faster GC in shark - If the table is larger than the memory size, only the hot columns will be cached in memory from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon Haoyuan On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson ilike...@gmail.com wrote: Shark's in-memory format is already serialized (it's compressed and column-based). On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan mri...@gmail.com wrote: You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from Tachyon; we can directly read from the buffers in the same way that Shark reads from its in-memory columnar format. On Tue, Jul 8, 2014 at 1:18 AM, qingyang li liqingyang1...@gmail.com wrote: hi, when i create a table, i can point the cache strategy using shark.cache, i think shark.cache=memory_only means data are managed by spark, and data are in the same jvm with excutor; while shark.cache=tachyon means data are managed by tachyon which is off heap, and data are not in the same jvm with excutor, so spark will load data from tachyon for each query sql , so, is tachyon less efficient than memory_only cache strategy ? if yes, can we let spark load all data once from tachyon for all sql query if i want to use tachyon cache strategy since tachyon is more HA than memory_only ? -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/
on shark, is tachyon less efficient than memory_only cache strategy ?
hi, when i create a table, i can point the cache strategy using shark.cache, i think shark.cache=memory_only means data are managed by spark, and data are in the same jvm with excutor; while shark.cache=tachyon means data are managed by tachyon which is off heap, and data are not in the same jvm with excutor, so spark will load data from tachyon for each query sql , so, is tachyon less efficient than memory_only cache strategy ? if yes, can we let spark load all data once from tachyon for all sql query if i want to use tachyon cache strategy since tachyon is more HA than memory_only ?
Re: task always lost
executor always been removed. someone encountered same issue https://groups.google.com/forum/#!topic/spark-users/-mYn6BF-Y5Y - 14/07/02 17:41:16 INFO storage.BlockManagerMasterActor: Trying to remove executor 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster. 14/07/02 17:41:16 INFO storage.BlockManagerMaster: Removed 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor 14/07/02 17:41:16 DEBUG spark.MapOutputTrackerMaster: Increasing epoch to 10 14/07/02 17:41:16 INFO scheduler.DAGScheduler: Host gained which was in lost list earlier: bigdata001 14/07/02 17:41:16 DEBUG scheduler.TaskSchedulerImpl: parentName: , name: TaskSet_0, runningTasks: 0 14/07/02 17:41:16 DEBUG scheduler.TaskSchedulerImpl: parentName: , name: TaskSet_0, runningTasks: 0 14/07/02 17:41:16 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 12 on executor 20140616-143932-1694607552-5050-4080-3: bigdata004 (NODE_LOCAL) 14/07/02 17:41:16 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 10785 bytes in 1 ms 14/07/02 17:41:16 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 13 on executor 20140616-104524-1694607552-5050-26919-3: bigdata002 (NODE_LOCAL 2014-07-02 12:01 GMT+08:00 qingyang li liqingyang1...@gmail.com: also this one in warning log: E0702 11:35:08.869998 17840 slave.cpp:2310] Container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1' for executor '20140616-104524-1694607552-5050-26919-1' of framework '20140702-113428-1694607552-5050-17766-' failed to start: Failed to fetch URIs for container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1': exit status 32512 2014-07-02 11:46 GMT+08:00 qingyang li liqingyang1...@gmail.com: Here is the log: E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container for executor 20140616-104524-1694607552-5050-26919-1 of framework 20140702-102939-1694607552-5050-14846-: Not monitored 2014-07-02 1:45 GMT+08:00 Aaron Davidson ilike...@gmail.com: Can you post the logs from any of the dying executors? On Tue, Jul 1, 2014 at 1:25 AM, qingyang li liqingyang1...@gmail.com wrote: i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when I using spark-shell to submit one job, the tasks always lost. here is the log: -- 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-104524-1694607552-5050-26919-1 (epoch 3427) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-143932-1694607552-5050-4080-2 (epoch 3428) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata001 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes in 0 ms it seems other guy has also encountered such problem, http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3c201305161047069952...@nfs.iscas.ac.cn%3E
Re: task always lost
Here is the log: E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container for executor 20140616-104524-1694607552-5050-26919-1 of framework 20140702-102939-1694607552-5050-14846-: Not monitored 2014-07-02 1:45 GMT+08:00 Aaron Davidson ilike...@gmail.com: Can you post the logs from any of the dying executors? On Tue, Jul 1, 2014 at 1:25 AM, qingyang li liqingyang1...@gmail.com wrote: i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when I using spark-shell to submit one job, the tasks always lost. here is the log: -- 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-104524-1694607552-5050-26919-1 (epoch 3427) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-143932-1694607552-5050-4080-2 (epoch 3428) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata001 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes in 0 ms it seems other guy has also encountered such problem, http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3c201305161047069952...@nfs.iscas.ac.cn%3E
Re: task always lost
also this one in warning log: E0702 11:35:08.869998 17840 slave.cpp:2310] Container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1' for executor '20140616-104524-1694607552-5050-26919-1' of framework '20140702-113428-1694607552-5050-17766-' failed to start: Failed to fetch URIs for container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1': exit status 32512 2014-07-02 11:46 GMT+08:00 qingyang li liqingyang1...@gmail.com: Here is the log: E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container for executor 20140616-104524-1694607552-5050-26919-1 of framework 20140702-102939-1694607552-5050-14846-: Not monitored 2014-07-02 1:45 GMT+08:00 Aaron Davidson ilike...@gmail.com: Can you post the logs from any of the dying executors? On Tue, Jul 1, 2014 at 1:25 AM, qingyang li liqingyang1...@gmail.com wrote: i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when I using spark-shell to submit one job, the tasks always lost. here is the log: -- 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-104524-1694607552-5050-26919-1 (epoch 3427) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1) 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: 20140616-143932-1694607552-5050-4080-2 (epoch 3428) 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster. 14/07/01 16:24:28 INFO BlockManagerMaster: Removed 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata005 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list earlier: bigdata001 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes in 0 ms 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL) 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes in 0 ms it seems other guy has also encountered such problem, http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3c201305161047069952...@nfs.iscas.ac.cn%3E
Re: encounter jvm problem when integreation spark with mesos
somebody else has also encountered such problem: http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3cafc0d60983129f4f9fbad571aa422c9a5af8f...@mail-mbx1.ad.renci.org%3E 2014-06-17 12:31 GMT+08:00 Andrew Ash and...@andrewash.com: Hi qingyang, This looks like an issue with the open source version of the Java runtime (called OpenJDK) that causes the JVM to fail. Can you try using the JVM released by Oracle and see if it has the same issue? Thanks! Andrew On Mon, Jun 16, 2014 at 9:24 PM, qingyang li liqingyang1...@gmail.com wrote: hi, I encounter jvm problem when integreation spark with mesos, here is the log when i run spark-shell: -48ce131dc5af 14/06/17 12:24:55 INFO HttpServer: Starting HTTP Server 14/06/17 12:24:55 INFO SparkUI: Started Spark Web UI at http://bigdata001:4040 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f94f4843d21, pid=5956, tid=140277175580416 # # JRE version: OpenJDK Runtime Environment (7.0_51-b02) (build 1.7.0_51-mockbuild_2014_01_15_01_39-b00) # Java VM: OpenJDK 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5e5d21] JNI_CreateJavaVM+0x6551 # # Core dump written. Default location: /home/zjw/spark/spark-0.9.0-incubating-bin-hadoop2/core or core.5956 # # An error report file with more information is saved as: # /tmp/jvm-5956/hs_error.log # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # http://icedtea.classpath.org/bugzilla # bin/spark-shell: line 101: 5956 Aborted (core dumped) $FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main $@
Re: encounter jvm problem when integreation spark with mesos
here is the core stack info: - (gdb) bt #0 0x7fc0153fc925 in raise () from /lib64/libc.so.6 #1 0x7fc0153fe105 in abort () from /lib64/libc.so.6 #2 0x7fc014d78405 in os::abort(bool) () from /home/zjw/jdk1.7/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x7fc014ef7347 in VMError::report_and_die() () from /home/zjw/jdk1.7/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x7fc014d7cd8f in JVM_handle_linux_signal () from /home/zjw/jdk1.7/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 signal handler called #6 0x7fc014b96ce9 in jni_GetByteArrayElements () from /home/zjw/jdk1.7/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x7fbff70f002c in GetByteArrayElements (env=value optimized out, jobj=0x7fbf8c000f80) at /home/zjw/jdk1.7/jdk1.7.0_51//include/jni.h:1668 #8 constructmesos::FrameworkInfo (env=value optimized out, jobj=0x7fbf8c000f80) at ../../src/java/jni/construct.cpp:123 #9 0x7fbff70f51c8 in Java_org_apache_mesos_MesosSchedulerDriver_initialize (env=0x7fc010d189e8, thiz=0x7fbfd94f6830) at ../../src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp:528 2014-06-17 16:12 GMT+08:00 andy petrella andy.petre...@gmail.com: Yep but no real resolution nor advances on this topic, since finally we've chosen to stick with a compatible version of Mesos (0.14.1 ftm). But I'm still convince it has to do with native libs clash :-s aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab On Tue, Jun 17, 2014 at 9:57 AM, qingyang li liqingyang1...@gmail.com wrote: somebody else has also encountered such problem: http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3cafc0d60983129f4f9fbad571aa422c9a5af8f...@mail-mbx1.ad.renci.org%3E 2014-06-17 12:31 GMT+08:00 Andrew Ash and...@andrewash.com: Hi qingyang, This looks like an issue with the open source version of the Java runtime (called OpenJDK) that causes the JVM to fail. Can you try using the JVM released by Oracle and see if it has the same issue? Thanks! Andrew On Mon, Jun 16, 2014 at 9:24 PM, qingyang li liqingyang1...@gmail.com wrote: hi, I encounter jvm problem when integreation spark with mesos, here is the log when i run spark-shell: -48ce131dc5af 14/06/17 12:24:55 INFO HttpServer: Starting HTTP Server 14/06/17 12:24:55 INFO SparkUI: Started Spark Web UI at http://bigdata001:4040 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f94f4843d21, pid=5956, tid=140277175580416 # # JRE version: OpenJDK Runtime Environment (7.0_51-b02) (build 1.7.0_51-mockbuild_2014_01_15_01_39-b00) # Java VM: OpenJDK 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5e5d21] JNI_CreateJavaVM+0x6551 # # Core dump written. Default location: /home/zjw/spark/spark-0.9.0-incubating-bin-hadoop2/core or core.5956 # # An error report file with more information is saved as: # /tmp/jvm-5956/hs_error.log # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # http://icedtea.classpath.org/bugzilla # bin/spark-shell: line 101: 5956 Aborted (core dumped) $FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main $@
encounter jvm problem when integreation spark with mesos
hi, I encounter jvm problem when integreation spark with mesos, here is the log when i run spark-shell: -48ce131dc5af 14/06/17 12:24:55 INFO HttpServer: Starting HTTP Server 14/06/17 12:24:55 INFO SparkUI: Started Spark Web UI at http://bigdata001:4040 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f94f4843d21, pid=5956, tid=140277175580416 # # JRE version: OpenJDK Runtime Environment (7.0_51-b02) (build 1.7.0_51-mockbuild_2014_01_15_01_39-b00) # Java VM: OpenJDK 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5e5d21] JNI_CreateJavaVM+0x6551 # # Core dump written. Default location: /home/zjw/spark/spark-0.9.0-incubating-bin-hadoop2/core or core.5956 # # An error report file with more information is saved as: # /tmp/jvm-5956/hs_error.log # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # http://icedtea.classpath.org/bugzilla # bin/spark-shell: line 101: 5956 Aborted (core dumped) $FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main $@
how spark partition data when creating table like create table xxx as select * from xxx
hi, spark-developers, i am using shark/spark, and i am puzzled by such question, and can not find any info from the web, so i ask you. 1. how spark partition data in memory when creating table when using create table a tblproperties(shark.cache=memory) as select * from table b , in another words, how many rdds will be created ? how spark decide the number of rdds ? 2. how spark partition data on tachyon when creating table when using create table a tblproperties(shark.cache=tachyon) as select * from table b . in another words, how many files will be created ? how spark decide the number of files? i found this settings about tachyon tachyon.user.default.block.size.byte , what it means? could i set it to control each file size ? thanks for any guiding .
Re: can RDD be shared across mutil spark applications?
thanks for sharing, I am using tachyon to store RDD now. 2014-05-18 12:02 GMT+08:00 Christopher Nguyen c...@adatao.com: Qing Yang, Andy is correct in answering your direct question. At the same time, depending on your context, you may be able to apply a pattern where you turn the single Spark application into a service, and multiple clients if that service can indeed share access to the same RDDs. Several groups have built apps based on this pattern, and we will also show something with this behavior at the upcoming Spark Summit (multiple users collaborating on named DDFs with the same underlying RDDs). Sent while mobile. Pls excuse typos etc. On May 18, 2014 9:40 AM, Andy Konwinski andykonwin...@gmail.com wrote: RDDs cannot currently be shared across multiple SparkContexts without using something like the Tachyon project (which is a separate project/codebase). Andy On May 16, 2014 2:14 PM, qingyang li liqingyang1...@gmail.com wrote:
can RDD be shared across mutil spark applications?
get -101 error code when running select query
hi, i have started one sharkserver2 , and using java code to send query to this server by hive jdbc, but i got such error: -- FAILED: Execution Error, return code -101 from shark.execution.SparkTask org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from shark.execution.SparkTask at shark.server.SharkSQLOperation.run(SharkSQLOperation.scala:45) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:180) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:152) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$2.run(TUGIContainingProcessor.java:64) at org.apache.hive.service.auth.TUGIContainingProcessor$2.run(TUGIContainingProcessor.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:61) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) --- do anyone encounter this problem?
Re: get -101 error code when running select query
thanks for sharing, my case is diffrent from yours, i have set hive.server2.enable.doAs into false in hive-site.xml, then that 101 error code disappeared. 2014-04-24 9:26 GMT+08:00 Madhu ma...@madhu.com: I have seen a similar error message when connecting to Hive through JDBC. This is just a guess on my part, but check your query. The error occurs if you have a select that includes a null literal with an alias like this: select a, b, null as c, d from foo In my case, rewriting the query to use an empty string or other literal instead of null worked: select a, b, '' as c, d from foo I think the problem is the lack of type information when supplying a null literal. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/get-101-error-code-when-running-select-query-tp6377p6382.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: does shark0.9.1 work well with hadoop2.2.0 ?
this can help resolve protobuf version problem, too. https://groups.google.com/forum/#!msg/shark-users/0pGIVQvaYfo/-43oaK8scNAJ 2014-04-20 23:53 GMT+08:00 Gordon Wang gw...@gopivotal.com: replacing the jar is not enough. You have to change protobuf dependency in shark's build script. and recompile the source. Protobuf 2.4.1 and 2.5.0 is not binary compatible. On Sun, Apr 20, 2014 at 6:45 PM, qingyang li liqingyang1...@gmail.com wrote: shark 0.9.1 is using protobuf 2.4.1 , but hadoop2.2.0 is using protobuf2.5.0, how can we make them work together? I have tried replace protobuf2.4.1 in shark with protobuf2.5.0, it does not work. I have also tried replacing protobuf2.5.0 in hadoop with shark's 2.4.1, it does not work too. -- Regards Gordon Wang
does shark0.9.1 work well with hadoop2.2.0 ?
shark 0.9.1 is using protobuf 2.4.1 , but hadoop2.2.0 is using protobuf2.5.0, how can we make them work together? I have tried replace protobuf2.4.1 in shark with protobuf2.5.0, it does not work. I have also tried replacing protobuf2.5.0 in hadoop with shark's 2.4.1, it does not work too.
Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1
Egor, i encounter the same problem which you have asked in this thread: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3CCAMrx5DwJVJS0g_FE7_2qwMu4Xf0y5VfV=tlyauv2kh5v4k6...@mail.gmail.com%3E have you fixed this problem? i am using shark to read a table which i have created on hdfs. i found in shark lib_managed directory there are two protobuf*.jar: [root@bigdata001 shark-0.9.0]# find . -name proto*.jar ./lib_managed/jars/org.spark-project.protobuf/protobuf-java/protobuf-java-2.4.1-shaded.jar ./lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar my hadoop is using protobuf-java-2.5.0.jar .
Re: Error executing sql using shark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1
my spark can also work well with hadoop2.2.0 my shark can not work well with hadoop2.2.0 because protobuf version problem. in shark direcotry , i found two vesions of protobuf, and they all are loaded into classpath. [root@bigdata001 shark-0.9.0]# find . -name proto*.jar ./lib_managed/jars/org.spark-project.protobuf/protobuf-java/protobuf-java-2.4.1-shaded.jar ./lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar 2014-03-27 2:26 GMT+08:00 yao yaosheng...@gmail.com: @qingyang, spark 0.9.0 works for me perfectly when accessing (read/write) data on hdfs. BTW, if you look at pom.xml, you have to choose yarn profile to compile spark, so that it won't include protobuf 2.4.1 in your final jars. Here is the command line we use to compile spark with hadoop 2.2: mvn -U -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pyarn -DskipTests package Thanks -Shengzhe On Wed, Mar 26, 2014 at 12:04 AM, qingyang li liqingyang1...@gmail.com wrote: Egor, i encounter the same problem which you have asked in this thread: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3CCAMrx5DwJVJS0g_FE7_2qwMu4Xf0y5VfV=tlyauv2kh5v4k6...@mail.gmail.com%3E have you fixed this problem? i am using shark to read a table which i have created on hdfs. i found in shark lib_managed directory there are two protobuf*.jar: [root@bigdata001 shark-0.9.0]# find . -name proto*.jar ./lib_managed/jars/org.spark-project.protobuf/protobuf-java/protobuf-java-2.4.1-shaded.jar ./lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar my hadoop is using protobuf-java-2.5.0.jar .
Re: how to config worker HA
can someone help me ? 2014-03-12 21:26 GMT+08:00 qingyang li liqingyang1...@gmail.com: in addition: on this site: https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets , i find RDD can be stored using a different *storage level on the web, and *also find StorageLevel's attribute MEMORY_ONLY_2 . MEMORY_ONLY_2, Same as the levels above, but replicate each partition on two cluster nodes. 1. is this one point of fault-tolerance ? 2.if replicate each partition on two cluster nodes will help worker node HA ? 3. if there is MEMORY_ONLY_3 which could replicate each partition on three cluster nodes? 2014-03-12 12:11 GMT+08:00 qingyang li liqingyang1...@gmail.com: i have one table in memery, when one worker becomes dead, i can not query data from that table. Here is it's storage status: RDD Name Storage LevelCached PartitionsFraction CachedSize in MemorySize on Disk http://192.168.1.101:4040/storage/rdd?id=47 table01 Memory Deserialized 1x Replicated 119 88% 697.0 MB 0.0 Bso, my question is: 1. what meaning is Memory Deserialized 1x Replicated ? 2. how to config worker HA so that i can query data even one worker dead.
java.lang.IllegalAccessError: tried to access field org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.conf from class org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator
dear community, i have used such command to build shark0.9: export SHARK_HADOOP_VERSION=2.2.0 sbt/sbt package but when i run bin/shark, I got this error: Exception in thread main java.lang.IllegalAccessError: tried to access field org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.conf from class org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator at org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator.setConf(ProxyUserAuthenticator.java:40) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:365) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:270) at shark.SharkCliDriver$.main(SharkCliDriver.scala:128) at shark.SharkCliDriver.main(SharkCliDriver.scala) it seems conf is one private variable, do any one know how to resovle this problem ?
Re: if there is shark 0.9 build can be download?
it is too slow to build shark using the latest source code , is there shark 0.9 build can be download? 2014-03-11 9:29 GMT+08:00 qingyang li liqingyang1...@gmail.com: Does anyone know if there is shark 0.9 build can be download? if not, when there will be shark 0.9 build?