[jira] [Updated] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-20079: Description: The ExecutorAllocationManager.reset method is called when re-registering AM, which sets the ExecutorAllocationManager.initializing field true. When this field is true, the Driver does not start a new executor from the AM request. The following two cases will cause the field to False 1. A executor idle for some time. 2. There are new stages to be submitted After the a stage was submitted, the AM was killed and restart ,the above two cases will not appear. 1. When AM is killed, the yarn will kill all running containers. All execuotr will be lost and no executor will be idle. 2. No surviving executor, resulting in the current stage will never be completed, DAG will not submit a new stage. Reproduction steps: 1. Start cluster {noformat} echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | ./bin/spark-shell --master yarn-client --executor-cores 1 --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=2 {noformat} 2. Kill the AM process when a stage is scheduled. was: 1. Start cluster echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | ./bin/spark-shell --master yarn-client --executor-cores 1 --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=2 2. Kill the AM process when a stage is scheduled. > Re registration of AM hangs spark cluster in yarn-client mode > - > > Key: SPARK-20079 > URL: https://issues.apache.org/jira/browse/SPARK-20079 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.0 >Reporter: Guoqiang Li > > The ExecutorAllocationManager.reset method is called when re-registering AM, > which sets the ExecutorAllocationManager.initializing field true. When this > field is true, the Driver does not start a new executor from the AM request. > The following two cases will cause the field to False > 1. A executor idle for some time. > 2. There are new stages to be submitted > After the a stage was submitted, the AM was killed and restart ,the above two > cases will not appear. > 1. When AM is killed, the yarn will kill all running containers. All execuotr > will be lost and no executor will be idle. > 2. No surviving executor, resulting in the current stage will never be > completed, DAG will not submit a new stage. > Reproduction steps: > 1. Start cluster > {noformat} > echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | > ./bin/spark-shell --master yarn-client --executor-cores 1 --conf > spark.shuffle.service.enabled=true --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.maxExecutors=2 > {noformat} > 2. Kill the AM process when a stage is scheduled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949164#comment-15949164 ] Guoqiang Li commented on SPARK-20079: - I met the problem when doing reliability test. After the restart AM, in more than ten hours it has not applied for a new executor. > Re registration of AM hangs spark cluster in yarn-client mode > - > > Key: SPARK-20079 > URL: https://issues.apache.org/jira/browse/SPARK-20079 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.0 >Reporter: Guoqiang Li > > 1. Start cluster > echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | > ./bin/spark-shell --master yarn-client --executor-cores 1 --conf > spark.shuffle.service.enabled=true --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.maxExecutors=2 > 2. Kill the AM process when a stage is scheduled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode
Guoqiang Li created SPARK-20079: --- Summary: Re registration of AM hangs spark cluster in yarn-client mode Key: SPARK-20079 URL: https://issues.apache.org/jira/browse/SPARK-20079 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 2.1.0 Reporter: Guoqiang Li 1. Start cluster echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | ./bin/spark-shell --master yarn-client --executor-cores 1 --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=2 2. Kill the AM process when a stage is scheduled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18890) Do all task serialization in CoarseGrainedExecutorBackend thread (rather than TaskSchedulerImpl)
[ https://issues.apache.org/jira/browse/SPARK-18890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931115#comment-15931115 ] Guoqiang Li commented on SPARK-18890: - [~kayousterhout] done. > Do all task serialization in CoarseGrainedExecutorBackend thread (rather than > TaskSchedulerImpl) > > > Key: SPARK-18890 > URL: https://issues.apache.org/jira/browse/SPARK-18890 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Kay Ousterhout >Priority: Minor > > As part of benchmarking this change: > https://github.com/apache/spark/pull/15505 and alternatives, [~shivaram] and > I found that moving task serialization from TaskSetManager (which happens as > part of the TaskSchedulerImpl's thread) to CoarseGranedSchedulerBackend leads > to approximately a 10% reduction in job runtime for a job that counted 10,000 > partitions (that each had 1 int) using 20 machines. Similar performance > improvements were reported in the pull request linked above. This would > appear to be because the TaskSchedulerImpl thread is the bottleneck, so > moving serialization to CGSB reduces runtime. This change may *not* improve > runtime (and could potentially worsen runtime) in scenarios where the CGSB > thread is the bottleneck (e.g., if tasks are very large, so calling launch to > send the tasks to the executor blocks on the network). > One benefit of implementing this change is that it makes it easier to > parallelize the serialization of tasks (different tasks could be serialized > by different threads). Another benefit is that all of the serialization > occurs in the same place (currently, the Task is serialized in > TaskSetManager, and the TaskDescription is serialized in CGSB). > I'm not totally convinced we should fix this because it seems like there are > better ways of reducing the serialization time (e.g., by re-using a single > serialized object with the Task/jars/files and broadcasting it for each > stage) but I wanted to open this JIRA to document the discussion. > cc [~witgo] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19991) FileSegmentManagedBuffer performance improvement.
Guoqiang Li created SPARK-19991: --- Summary: FileSegmentManagedBuffer performance improvement. Key: SPARK-19991 URL: https://issues.apache.org/jira/browse/SPARK-19991 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 2.1.0, 2.0.2 Reporter: Guoqiang Li Priority: Minor When we do not set the value of the configuration items `{{spark.storage.memoryMapThreshold}} and {{spark.shuffle.io.lazyFD}}, each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation. The shuffle-server thread`s stack: {noformat} "shuffle-server-2-42" #335 daemon prio=5 os_prio=0 tid=0x7f71e4507800 nid=0x28d12 runnable [0x7f71af93e000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked <0x0007a930f080> (a java.util.NoSuchElementException) at java.lang.Throwable.(Throwable.java:265) at java.lang.Exception.(Exception.java:66) at java.lang.RuntimeException.(RuntimeException.java:62) at java.util.NoSuchElementException.(NoSuchElementException.java:57) at org.apache.spark.network.yarn.util.HadoopConfigProvider.get(HadoopConfigProvider.java:38) at org.apache.spark.network.util.ConfigProvider.get(ConfigProvider.java:31) at org.apache.spark.network.util.ConfigProvider.getBoolean(ConfigProvider.java:50) at org.apache.spark.network.util.TransportConf.lazyFileDescriptor(TransportConf.java:157) at org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:132) at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:54) at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33) at org.spark_project.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:88) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:743) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:735) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:820) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:728) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:284) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:743) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:806) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:818) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:799) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:835) at org.spark_project.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1017) at org.spark_project.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:256) at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194) at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:135) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) at org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at
[jira] [Updated] (SPARK-19991) FileSegmentManagedBuffer performance improvement.
[ https://issues.apache.org/jira/browse/SPARK-19991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-19991: Description: When we do not set the value of the configuration items {{spark.storage.memoryMapThreshold}} and {{spark.shuffle.io.lazyFD}}, each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation. The shuffle-server thread`s stack: {noformat} "shuffle-server-2-42" #335 daemon prio=5 os_prio=0 tid=0x7f71e4507800 nid=0x28d12 runnable [0x7f71af93e000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked <0x0007a930f080> (a java.util.NoSuchElementException) at java.lang.Throwable.(Throwable.java:265) at java.lang.Exception.(Exception.java:66) at java.lang.RuntimeException.(RuntimeException.java:62) at java.util.NoSuchElementException.(NoSuchElementException.java:57) at org.apache.spark.network.yarn.util.HadoopConfigProvider.get(HadoopConfigProvider.java:38) at org.apache.spark.network.util.ConfigProvider.get(ConfigProvider.java:31) at org.apache.spark.network.util.ConfigProvider.getBoolean(ConfigProvider.java:50) at org.apache.spark.network.util.TransportConf.lazyFileDescriptor(TransportConf.java:157) at org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:132) at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:54) at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33) at org.spark_project.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:88) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:743) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:735) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:820) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:728) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:284) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:743) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:806) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:818) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:799) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:835) at org.spark_project.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1017) at org.spark_project.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:256) at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194) at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:135) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) at org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) at org.spark_project.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) at org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) at
[jira] [Commented] (SPARK-18890) Do all task serialization in CoarseGrainedExecutorBackend thread (rather than TaskSchedulerImpl)
[ https://issues.apache.org/jira/browse/SPARK-18890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809212#comment-15809212 ] Guoqiang Li commented on SPARK-18890: - OK, I see. In fact, the most difficult to deal with the partition, including HadoopPartition, JDBCPartition, BlockRDDPartition, and even other user-defined class. We can only use the Java serializer to serialize instances of these classes. > Do all task serialization in CoarseGrainedExecutorBackend thread (rather than > TaskSchedulerImpl) > > > Key: SPARK-18890 > URL: https://issues.apache.org/jira/browse/SPARK-18890 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Kay Ousterhout >Priority: Minor > > As part of benchmarking this change: > https://github.com/apache/spark/pull/15505 and alternatives, [~shivaram] and > I found that moving task serialization from TaskSetManager (which happens as > part of the TaskSchedulerImpl's thread) to CoarseGranedSchedulerBackend leads > to approximately a 10% reduction in job runtime for a job that counted 10,000 > partitions (that each had 1 int) using 20 machines. Similar performance > improvements were reported in the pull request linked above. This would > appear to be because the TaskSchedulerImpl thread is the bottleneck, so > moving serialization to CGSB reduces runtime. This change may *not* improve > runtime (and could potentially worsen runtime) in scenarios where the CGSB > thread is the bottleneck (e.g., if tasks are very large, so calling launch to > send the tasks to the executor blocks on the network). > One benefit of implementing this change is that it makes it easier to > parallelize the serialization of tasks (different tasks could be serialized > by different threads). Another benefit is that all of the serialization > occurs in the same place (currently, the Task is serialized in > TaskSetManager, and the TaskDescription is serialized in CGSB). > I'm not totally convinced we should fix this because it seems like there are > better ways of reducing the serialization time (e.g., by re-using a single > serialized object with the Task/jars/files and broadcasting it for each > stage) but I wanted to open this JIRA to document the discussion. > cc [~witgo] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18890) Do all task serialization in CoarseGrainedExecutorBackend thread (rather than TaskSchedulerImpl)
[ https://issues.apache.org/jira/browse/SPARK-18890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753833#comment-15753833 ] Guoqiang Li commented on SPARK-18890: - [~kayousterhout] I ran a test having 10 million tasks, and did not see the bottleneck. Can you provide any cases where CGSB becomes a bottleneck? On the use of broadcast transmission tasks, broadcaset size is proportional to the number of tasks(The location and partition of each task are different). If the size of each serialized location is 100 bytes, then 10 million tasks generated a broadcaset with 953M size > Do all task serialization in CoarseGrainedExecutorBackend thread (rather than > TaskSchedulerImpl) > > > Key: SPARK-18890 > URL: https://issues.apache.org/jira/browse/SPARK-18890 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Kay Ousterhout >Priority: Minor > > As part of benchmarking this change: > https://github.com/apache/spark/pull/15505 and alternatives, [~shivaram] and > I found that moving task serialization from TaskSetManager (which happens as > part of the TaskSchedulerImpl's thread) to CoarseGranedSchedulerBackend leads > to approximately a 10% reduction in job runtime for a job that counted 10,000 > partitions (that each had 1 int) using 20 machines. Similar performance > improvements were reported in the pull request linked above. This would > appear to be because the TaskSchedulerImpl thread is the bottleneck, so > moving serialization to CGSB reduces runtime. This change may *not* improve > runtime (and could potentially worsen runtime) in scenarios where the CGSB > thread is the bottleneck (e.g., if tasks are very large, so calling launch to > send the tasks to the executor blocks on the network). > One benefit of implementing this change is that it makes it easier to > parallelize the serialization of tasks (different tasks could be serialized > by different threads). Another benefit is that all of the serialization > occurs in the same place (currently, the Task is serialized in > TaskSetManager, and the TaskDescription is serialized in CGSB). > I'm not totally convinced we should fix this because it seems like there are > better ways of reducing the serialization time (e.g., by re-using a single > serialized object with the Task/jars/files and broadcasting it for each > stage) but I wanted to open this JIRA to document the discussion. > cc [~witgo] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18383) Utils.isBindCollision does not properly handle all possible address-port collisions when binding
[ https://issues.apache.org/jira/browse/SPARK-18383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18383: Affects Version/s: 1.6.2 2.0.1 > Utils.isBindCollision does not properly handle all possible address-port > collisions when binding > > > Key: SPARK-18383 > URL: https://issues.apache.org/jira/browse/SPARK-18383 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > When the IO mode is set to epoll, Netty uses {{io.netty.channel.unix.Socket}} > class, and {{Socket.bind}} throws an exception that is a > {{io.netty.channel.unix.Errors.NativeIoException}} instead of a > {{java.net.BindException}} instance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18383) Utils.isBindCollision does not properly handle all possible address-port collisions when binding
[ https://issues.apache.org/jira/browse/SPARK-18383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18383: Component/s: Spark Core > Utils.isBindCollision does not properly handle all possible address-port > collisions when binding > > > Key: SPARK-18383 > URL: https://issues.apache.org/jira/browse/SPARK-18383 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > When the IO mode is set to epoll, Netty uses {{io.netty.channel.unix.Socket}} > class, and {{Socket.bind}} throws an exception that is a > {{io.netty.channel.unix.Errors.NativeIoException}} instead of a > {{java.net.BindException}} instance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18383) Utils.isBindCollision does not properly handle all possible address-port collisions when binding
Guoqiang Li created SPARK-18383: --- Summary: Utils.isBindCollision does not properly handle all possible address-port collisions when binding Key: SPARK-18383 URL: https://issues.apache.org/jira/browse/SPARK-18383 Project: Spark Issue Type: Bug Reporter: Guoqiang Li When the IO mode is set to epoll, Netty uses {{io.netty.channel.unix.Socket}} class, and {{Socket.bind}} throws an exception that is a {{io.netty.channel.unix.Errors.NativeIoException}} instead of a {{java.net.BindException}} instance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18375) Upgrade netty to 4.0.42.Final
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Issue Type: Improvement (was: Bug) > Upgrade netty to 4.0.42.Final > -- > > Key: SPARK-18375 > URL: https://issues.apache.org/jira/browse/SPARK-18375 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > One of the important changes for 4.0.42.Final is "Support any FileRegion > implementation when using epoll transport > [#5825|https://github.com/netty/netty/pull/5825];. > In > 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] > can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18375) Upgrade netty to 4.0.42.Final
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Summary: Upgrade netty to 4.0.42.Final (was: Upgrade netty to 4.0.42) > Upgrade netty to 4.0.42.Final > -- > > Key: SPARK-18375 > URL: https://issues.apache.org/jira/browse/SPARK-18375 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > One of the important changes for 4.0.42.Final is "Support any FileRegion > implementation when using epoll transport > [#5825|https://github.com/netty/netty/pull/5825];. > In > 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] > can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18375) Upgrade netty to 4.0.42
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Summary: Upgrade netty to 4.0.42 (was: Upgrade netty to 4.042) > Upgrade netty to 4.0.42 > --- > > Key: SPARK-18375 > URL: https://issues.apache.org/jira/browse/SPARK-18375 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > One of the important changes for 4.0.42.Final is "Support any FileRegion > implementation when using epoll transport > [#5825|https://github.com/netty/netty/pull/5825];. > In > 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] > can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18375) Upgrade netty to 4.042
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Description: One of the important changes for 4.0.42.Final is "Support any FileRegion implementation when using epoll transport [#5825|https://github.com/netty/netty/pull/5825];. In 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll was: The important changes for 4.0.42.Final is "Support any FileRegion implementation when using epoll transport [#5825|https://github.com/netty/netty/pull/5825];. In 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll > Upgrade netty to 4.042 > -- > > Key: SPARK-18375 > URL: https://issues.apache.org/jira/browse/SPARK-18375 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > One of the important changes for 4.0.42.Final is "Support any FileRegion > implementation when using epoll transport > [#5825|https://github.com/netty/netty/pull/5825];. > In > 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] > can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18375) Upgrade netty to 4.042
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Affects Version/s: 1.6.2 > Upgrade netty to 4.042 > -- > > Key: SPARK-18375 > URL: https://issues.apache.org/jira/browse/SPARK-18375 > Project: Spark > Issue Type: Bug > Components: Build, Spark Core >Affects Versions: 1.6.2, 2.0.1 >Reporter: Guoqiang Li > > The important changes for 4.0.42.Final is "Support any FileRegion > implementation when using epoll transport > [#5825|https://github.com/netty/netty/pull/5825];. > In > 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] > can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18375) Upgrade netty to 4.042
Guoqiang Li created SPARK-18375: --- Summary: Upgrade netty to 4.042 Key: SPARK-18375 URL: https://issues.apache.org/jira/browse/SPARK-18375 Project: Spark Issue Type: Bug Components: Build, Spark Core Affects Versions: 2.0.1 Reporter: Guoqiang Li The important changes for 4.0.42.Final is "Support any FileRegion implementation when using epoll transport [#5825|https://github.com/netty/netty/pull/5825];. In 4.0.42.Final,[MessageWithHeader|https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java] can work properly when {{spark.(shufflem, rpc).io.mode}} is set to epoll -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17930) The SerializerInstance instance used when deserializing a TaskResult is not reused
[ https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581119#comment-15581119 ] Guoqiang Li commented on SPARK-17930: - TPC-DS 2T data (Parquet) and the SQL(query 2) => {noformat} select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'M' and cd_education_status = '4 yr Degree' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 2001 group by i_item_id order by i_item_id limit 100; {noformat} spark-defaults.conf => {noformat} spark.master yarn-client spark.executor.instances 20 spark.driver.memory16g spark.executor.memory 30g spark.executor.cores 5 spark.default.parallelism 100 spark.sql.shuffle.partitions 10 spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.maxResultSize 0 spark.rpc.netty.dispatcher.numThreads 8 spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M spark.cleaner.referenceTracking.blocking true spark.cleaner.referenceTracking.blocking.shuffle true {noformat} Performance test results are as follows => ||[SPARK-17930|https://github.com/witgo/spark/tree/SPARK-17930]||[ed14633|https://github.com/witgo/spark/commit/ed1463341455830b8867b721a1b34f291139baf3]|| |54.5 s|231.7 s| > The SerializerInstance instance used when deserializing a TaskResult is not > reused > --- > > Key: SPARK-17930 > URL: https://issues.apache.org/jira/browse/SPARK-17930 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.1 >Reporter: Guoqiang Li > > The following code is called when the DirectTaskResult instance is > deserialized > {noformat} > def value(): T = { > if (valueObjectDeserialized) { > valueObject > } else { > // Each deserialization creates a new instance of SerializerInstance, > which is very time-consuming > val resultSer = SparkEnv.get.serializer.newInstance() > valueObject = resultSer.deserialize(valueBytes) > valueObjectDeserialized = true > valueObject > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17930) The SerializerInstance instance used when deserializing a TaskResult is not reused
[ https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577506#comment-15577506 ] Guoqiang Li commented on SPARK-17930: - If a stage contains a lot of tasks, eg one million tasks, the code here needs to create one million SerializerInstance instances, which seriously affects the performance of the DAG. At least we can reuse the SerializerInstance instance per stage. > The SerializerInstance instance used when deserializing a TaskResult is not > reused > --- > > Key: SPARK-17930 > URL: https://issues.apache.org/jira/browse/SPARK-17930 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.1, 2.0.1 >Reporter: Guoqiang Li > > The following code is called when the DirectTaskResult instance is > deserialized > {noformat} > def value(): T = { > if (valueObjectDeserialized) { > valueObject > } else { > // Each deserialization creates a new instance of SerializerInstance, > which is very time-consuming > val resultSer = SparkEnv.get.serializer.newInstance() > valueObject = resultSer.deserialize(valueBytes) > valueObjectDeserialized = true > valueObject > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17931) taskScheduler has some unneeded serialization
Guoqiang Li created SPARK-17931: --- Summary: taskScheduler has some unneeded serialization Key: SPARK-17931 URL: https://issues.apache.org/jira/browse/SPARK-17931 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Guoqiang Li When taskScheduler instantiates TaskDescription, it calls `Task.serializeWithDependencies(task, sched.sc.addedFiles, sched.sc.addedJars, ser)`. It serializes task and its dependency. But after SPARK-2521 has been merged into the master, the ResultTask class and ShuffleMapTask class no longer contain rdd and closure objects. TaskDescription class can be changed as below: {noformat} class TaskDescription[T]( val taskId: Long, val attemptNumber: Int, val executorId: String, val name: String, val index: Int, val task: Task[T]) extends Serializable {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17930) The SerializerInstance instance used when deserializing a TaskResult is not reused
Guoqiang Li created SPARK-17930: --- Summary: The SerializerInstance instance used when deserializing a TaskResult is not reused Key: SPARK-17930 URL: https://issues.apache.org/jira/browse/SPARK-17930 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.1, 1.6.1 Reporter: Guoqiang Li The following code is called when the DirectTaskResult instance is deserialized {noformat} def value(): T = { if (valueObjectDeserialized) { valueObject } else { // Each deserialization creates a new instance of SerializerInstance, which is very time-consuming val resultSer = SparkEnv.get.serializer.newInstance() valueObject = resultSer.deserialize(valueBytes) valueObjectDeserialized = true valueObject } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515042#comment-15515042 ] Guoqiang Li commented on SPARK-6235: ping [~rxin] > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508670#comment-15508670 ] Guoqiang Li commented on SPARK-6235: [~rxin] Any comments? > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-6235: --- Attachment: (was: SPARK-6235_Design_V0.01.pdf) > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-6235: --- Attachment: SPARK-6235_Design_V0.02.pdf > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > Attachments: SPARK-6235_Design_V0.01.pdf, SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-6235: --- Attachment: SPARK-6235_Design_V0.01.pdf Preliminary Design Document. > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > Attachments: SPARK-6235_Design_V0.01.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432284#comment-15432284 ] Guoqiang Li commented on SPARK-17082: - OK > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > The size of ByteBuffers can not be greater than 2G, should be replaced by > ChunkedByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17184) Replace ByteBuf with InputStream
[ https://issues.apache.org/jira/browse/SPARK-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432279#comment-15432279 ] Guoqiang Li commented on SPARK-17184: - ok > Replace ByteBuf with InputStream > > > Key: SPARK-17184 > URL: https://issues.apache.org/jira/browse/SPARK-17184 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Guoqiang Li > > The size of ByteBuf can not be greater than 2G, should be replaced by > InputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17184) Replace ByteBuf with InputStream
[ https://issues.apache.org/jira/browse/SPARK-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431903#comment-15431903 ] Guoqiang Li commented on SPARK-17184: - OK, I'll post the detailed design document later. > Replace ByteBuf with InputStream > > > Key: SPARK-17184 > URL: https://issues.apache.org/jira/browse/SPARK-17184 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Guoqiang Li > > The size of ByteBuf can not be greater than 2G, should be replaced by > InputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17184) Replace ByteBuf with InputStream
Guoqiang Li created SPARK-17184: --- Summary: Replace ByteBuf with InputStream Key: SPARK-17184 URL: https://issues.apache.org/jira/browse/SPARK-17184 Project: Spark Issue Type: Sub-task Components: Spark Core Reporter: Guoqiang Li The size of ByteBuf can not be greater than 2G, should be replaced by InputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-17082: Description: The size of ByteBuffers can not be greater than 2G, should be replaced by ChunkedByteBuffer (was: The size of ByteBuffers can not be greater than 2G, should be replaced by Java) > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > The size of ByteBuffers can not be greater than 2G, should be replaced by > ChunkedByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-17082: Description: The size of ByteBuffers can not be greater than 2G, should be replaced by Java (was: the various 2G limit we have in Spark, due to the use of ByteBuffers.) > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > The size of ByteBuffers can not be greater than 2G, should be replaced by Java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-17082: Issue Type: Sub-task (was: New Feature) Parent: SPARK-6235 > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > the various 2G limit we have in Spark, due to the use of ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
Guoqiang Li created SPARK-17082: --- Summary: Replace ByteBuffer with ChunkedByteBuffer Key: SPARK-17082 URL: https://issues.apache.org/jira/browse/SPARK-17082 Project: Spark Issue Type: New Feature Components: Shuffle, Spark Core Reporter: Guoqiang Li the various 2G limit we have in Spark, due to the use of ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421048#comment-15421048 ] Guoqiang Li commented on SPARK-6235: Yes, it contains a lot of minor changes, eg: Replace ByteBuffer with ChunkedByteBuffer > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554 ] Guoqiang Li edited comment on SPARK-6235 at 8/15/16 1:53 AM: - [~hvanhovell] The main changes. 1. Replace DiskStore method {{def getBytes (blockId: BlockId): ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}. 2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer. 3. Add Class {{ChunkFetchInputStream}}, used for flow control and code as follows: {noformat} package org.apache.spark.network.client; import java.io.IOException; import java.io.InputStream; import java.nio.channels.ClosedChannelException; import java.util.Iterator; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicReference; import com.google.common.primitives.UnsignedBytes; import io.netty.buffer.ByteBuf; import io.netty.channel.Channel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.spark.network.buffer.ChunkedByteBuffer; import org.apache.spark.network.buffer.ManagedBuffer; import org.apache.spark.network.protocol.StreamChunkId; import org.apache.spark.network.util.LimitedInputStream; import org.apache.spark.network.util.TransportFrameDecoder; public class ChunkFetchInputStream extends InputStream { private final Logger logger = LoggerFactory.getLogger(ChunkFetchInputStream.class); private final TransportResponseHandler handler; private final Channel channel; private final StreamChunkId streamId; private final long byteCount; private final ChunkReceivedCallback callback; private final LinkedBlockingQueue buffers = new LinkedBlockingQueue<>(1024); public final TransportFrameDecoder.Interceptor interceptor; private ByteBuf curChunk; private boolean isCallbacked = false; private long writerIndex = 0; private final AtomicReference cause = new AtomicReference<>(null); private final AtomicBoolean isClosed = new AtomicBoolean(false); public ChunkFetchInputStream( TransportResponseHandler handler, Channel channel, StreamChunkId streamId, long byteCount, ChunkReceivedCallback callback) { this.handler = handler; this.channel = channel; this.streamId = streamId; this.byteCount = byteCount; this.callback = callback; this.interceptor = new StreamInterceptor(); } @Override public int read() throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { byte b = curChunk.readByte(); return UnsignedBytes.toInt(b); } else { return -1; } } @Override public int read(byte[] dest, int offset, int length) throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { int amountToGet = Math.min(curChunk.readableBytes(), length); curChunk.readBytes(dest, offset, amountToGet); return amountToGet; } else { return -1; } } @Override public long skip(long bytes) throws IOException { if (isClosed.get()) return 0L; pullChunk(); if (curChunk != null) { int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes()); curChunk.skipBytes(amountToSkip); return amountToSkip; } else { return 0L; } } @Override public void close() throws IOException { if (!isClosed.get()) { releaseCurChunk(); isClosed.set(true); resetChannel(); Iterator itr = buffers.iterator(); while (itr.hasNext()) { itr.next().release(); } buffers.clear(); } } private void pullChunk() throws IOException { if (curChunk != null && !curChunk.isReadable()) releaseCurChunk(); if (curChunk == null && cause.get() == null && !isClosed.get()) { try { curChunk = buffers.take(); // if channel.read() will be not invoked automatically, // the method is called by here if (!channel.config().isAutoRead()) channel.read(); } catch (Throwable e) { setCause(e); } } if (cause.get() != null) throw new IOException(cause.get()); } private void setCause(Throwable e) { if (cause.get() == null) cause.set(e); } private void releaseCurChunk() { if (curChunk != null) { curChunk.release(); curChunk = null; } } private void onSuccess() throws IOException { if (isCallbacked) return; if (cause.get() != null) { callback.onFailure(streamId.chunkIndex, cause.get()); } else { InputStream inputStream = new LimitedInputStream(this, byteCount); ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, byteCount); callback.onSuccess(streamId.chunkIndex, managedBuffer); } isCallbacked = true; } private void resetChannel() { if
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554 ] Guoqiang Li commented on SPARK-6235: [~hvanhovell] The main changes. 1. Replace DiskStore method {{def getBytes (blockId: BlockId): ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}. 2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer. 3. Add Class Chunk Fetch InputStream, used for flow control and code as follows: {noformat} package org.apache.spark.network.client; import java.io.IOException; import java.io.InputStream; import java.nio.channels.ClosedChannelException; import java.util.Iterator; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicReference; import com.google.common.primitives.UnsignedBytes; import io.netty.buffer.ByteBuf; import io.netty.channel.Channel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.spark.network.buffer.ChunkedByteBuffer; import org.apache.spark.network.buffer.ManagedBuffer; import org.apache.spark.network.protocol.StreamChunkId; import org.apache.spark.network.util.LimitedInputStream; import org.apache.spark.network.util.TransportFrameDecoder; public class ChunkFetchInputStream extends InputStream { private final Logger logger = LoggerFactory.getLogger(ChunkFetchInputStream.class); private final TransportResponseHandler handler; private final Channel channel; private final StreamChunkId streamId; private final long byteCount; private final ChunkReceivedCallback callback; private final LinkedBlockingQueue buffers = new LinkedBlockingQueue<>(1024); public final TransportFrameDecoder.Interceptor interceptor; private ByteBuf curChunk; private boolean isCallbacked = false; private long writerIndex = 0; private final AtomicReference cause = new AtomicReference<>(null); private final AtomicBoolean isClosed = new AtomicBoolean(false); public ChunkFetchInputStream( TransportResponseHandler handler, Channel channel, StreamChunkId streamId, long byteCount, ChunkReceivedCallback callback) { this.handler = handler; this.channel = channel; this.streamId = streamId; this.byteCount = byteCount; this.callback = callback; this.interceptor = new StreamInterceptor(); } @Override public int read() throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { byte b = curChunk.readByte(); return UnsignedBytes.toInt(b); } else { return -1; } } @Override public int read(byte[] dest, int offset, int length) throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { int amountToGet = Math.min(curChunk.readableBytes(), length); curChunk.readBytes(dest, offset, amountToGet); return amountToGet; } else { return -1; } } @Override public long skip(long bytes) throws IOException { if (isClosed.get()) return 0L; pullChunk(); if (curChunk != null) { int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes()); curChunk.skipBytes(amountToSkip); return amountToSkip; } else { return 0L; } } @Override public void close() throws IOException { if (!isClosed.get()) { releaseCurChunk(); isClosed.set(true); resetChannel(); Iterator itr = buffers.iterator(); while (itr.hasNext()) { itr.next().release(); } buffers.clear(); } } private void pullChunk() throws IOException { if (curChunk != null && !curChunk.isReadable()) releaseCurChunk(); if (curChunk == null && cause.get() == null && !isClosed.get()) { try { curChunk = buffers.take(); // if channel.read() will be not invoked automatically, // the method is called by here if (!channel.config().isAutoRead()) channel.read(); } catch (Throwable e) { setCause(e); } } if (cause.get() != null) throw new IOException(cause.get()); } private void setCause(Throwable e) { if (cause.get() == null) cause.set(e); } private void releaseCurChunk() { if (curChunk != null) { curChunk.release(); curChunk = null; } } private void onSuccess() throws IOException { if (isCallbacked) return; if (cause.get() != null) { callback.onFailure(streamId.chunkIndex, cause.get()); } else { InputStream inputStream = new LimitedInputStream(this, byteCount); ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, byteCount); callback.onSuccess(streamId.chunkIndex, managedBuffer); } isCallbacked = true; } private void resetChannel() { if (!channel.config().isAutoRead()) {
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418225#comment-15418225 ] Guoqiang Li commented on SPARK-6235: I'm doing this work and I'll put the patch in this month. > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14178) DAGScheduler should get map output statuses directly, not by MapOutputTrackerMaster.getSerializedMapOutputStatuses.
[ https://issues.apache.org/jira/browse/SPARK-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-14178: Summary: DAGScheduler should get map output statuses directly, not by MapOutputTrackerMaster.getSerializedMapOutputStatuses. (was: Compare Option[String] and String directly) > DAGScheduler should get map output statuses directly, not by > MapOutputTrackerMaster.getSerializedMapOutputStatuses. > --- > > Key: SPARK-14178 > URL: https://issues.apache.org/jira/browse/SPARK-14178 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Guoqiang Li > > DAGScheduler gets map output statuses by > {{MapOutputTrackerMaster.getSerializedMapOutputStatuses}}. > [DAGScheduler.scala#L357 | > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L357] > {noformat} > private def newOrUsedShuffleStage( > shuffleDep: ShuffleDependency[_, _, _], > firstJobId: Int): ShuffleMapStage = { > val rdd = shuffleDep.rdd > val numTasks = rdd.partitions.length > val stage = newShuffleMapStage(rdd, numTasks, shuffleDep, firstJobId, > rdd.creationSite) > if (mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) { > val serLocs = > mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) > // Deserialization very time consuming. > val locs = MapOutputTracker.deserializeMapStatuses(serLocs) > (0 until locs.length).foreach { i => > if (locs(i) ne null) { > // locs(i) will be null if missing > stage.addOutputLoc(i, locs(i)) > } > } > } else { > // Kind of ugly: need to register RDDs with the cache and map output > tracker here > // since we can't do it in the RDD constructor because # of partitions > is unknown > logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")") > mapOutputTracker.registerShuffle(shuffleDep.shuffleId, > rdd.partitions.length) > } > stage > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14178) Compare Option[String] and String directly
Guoqiang Li created SPARK-14178: --- Summary: Compare Option[String] and String directly Key: SPARK-14178 URL: https://issues.apache.org/jira/browse/SPARK-14178 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Guoqiang Li DAGScheduler gets map output statuses by {{MapOutputTrackerMaster.getSerializedMapOutputStatuses}}. [DAGScheduler.scala#L357 | https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L357] {noformat} private def newOrUsedShuffleStage( shuffleDep: ShuffleDependency[_, _, _], firstJobId: Int): ShuffleMapStage = { val rdd = shuffleDep.rdd val numTasks = rdd.partitions.length val stage = newShuffleMapStage(rdd, numTasks, shuffleDep, firstJobId, rdd.creationSite) if (mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) { val serLocs = mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) // Deserialization very time consuming. val locs = MapOutputTracker.deserializeMapStatuses(serLocs) (0 until locs.length).foreach { i => if (locs(i) ne null) { // locs(i) will be null if missing stage.addOutputLoc(i, locs(i)) } } } else { // Kind of ugly: need to register RDDs with the cache and map output tracker here // since we can't do it in the RDD constructor because # of partitions is unknown logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")") mapOutputTracker.registerShuffle(shuffleDep.shuffleId, rdd.partitions.length) } stage } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13371) TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152424#comment-15152424 ] Guoqiang Li commented on SPARK-13371: - [~srowen] Of course not. > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13371) TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-13371: Comment: was deleted (was: [~srowen] Of course not.) > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13371) TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152425#comment-15152425 ] Guoqiang Li commented on SPARK-13371: - [~srowen] Of course not. > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13371) TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-13371: Summary: TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly. (was: Compare Option[String] and String directly in ) > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13371) Compare Option[String] and String directly in
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-13371: Summary: Compare Option[String] and String directly in (was: Compare Option[String] and String directly) > Compare Option[String] and String directly in > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13371) Compare Option[String] and String directly
[ https://issues.apache.org/jira/browse/SPARK-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-13371: Description: {noformat} TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly. {noformat} Ths code: https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 {code} if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { for (rack <- sched.getRackForHost(host)) { for (index <- speculatableTasks if canRunOnHost(index)) { val racks = tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) // racks: Seq[Option[String]] and rack: String if (racks.contains(rack)) { speculatableTasks -= index return Some((index, TaskLocality.RACK_LOCAL)) } } } } {code} was: {noformat} TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly. {noformat} Ths code: https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 {code} if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { for (rack <- sched.getRackForHost(host)) { for (index <- speculatableTasks if canRunOnHost(index)) { val racks = tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) // racks: Seq[Option[String]] and rack: String if (racks.contains(rack)) { speculatableTasks -= index return Some((index, TaskLocality.RACK_LOCAL)) } } } } {code} > Compare Option[String] and String directly > -- > > Key: SPARK-13371 > URL: https://issues.apache.org/jira/browse/SPARK-13371 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.2, 1.6.0 >Reporter: Guoqiang Li > > {noformat} > TaskSetManager.dequeueSpeculativeTask compares Option[String] and String > directly. > {noformat} > Ths code: > https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 > {code} > if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { > for (rack <- sched.getRackForHost(host)) { > for (index <- speculatableTasks if canRunOnHost(index)) { > val racks = > tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) > // racks: Seq[Option[String]] and rack: String > if (racks.contains(rack)) { > speculatableTasks -= index > return Some((index, TaskLocality.RACK_LOCAL)) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13371) Compare Option[String] and String directly
Guoqiang Li created SPARK-13371: --- Summary: Compare Option[String] and String directly Key: SPARK-13371 URL: https://issues.apache.org/jira/browse/SPARK-13371 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.6.0, 1.5.2 Reporter: Guoqiang Li {noformat} TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly. {noformat} Ths code: https://github.com/apache/spark/blob/87abcf7df921a5937fdb2bae8bfb30bfabc4970a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L344 {code} if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) { for (rack <- sched.getRackForHost(host)) { for (index <- speculatableTasks if canRunOnHost(index)) { val racks = tasks(index).preferredLocations.map(_.host).map(sched.getRackForHost) // racks: Seq[Option[String]] and rack: String if (racks.contains(rack)) { speculatableTasks -= index return Some((index, TaskLocality.RACK_LOCAL)) } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5575) Artificial neural networks for MLlib deep learning
[ https://issues.apache.org/jira/browse/SPARK-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984912#comment-14984912 ] Guoqiang Li commented on SPARK-5575: Hi [~bing], It is under development, and temporarily is private. > Artificial neural networks for MLlib deep learning > -- > > Key: SPARK-5575 > URL: https://issues.apache.org/jira/browse/SPARK-5575 > Project: Spark > Issue Type: Umbrella > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Alexander Ulanov > > Goal: Implement various types of artificial neural networks > Motivation: deep learning trend > Requirements: > 1) Basic abstractions such as Neuron, Layer, Error, Regularization, Forward > and Backpropagation etc. should be implemented as traits or interfaces, so > they can be easily extended or reused > 2) Implement complex abstractions, such as feed forward and recurrent networks > 3) Implement multilayer perceptron (MLP), convolutional networks (LeNet), > autoencoder (sparse and denoising), stacked autoencoder, restricted > boltzmann machines (RBM), deep belief networks (DBN) etc. > 4) Implement or reuse supporting constucts, such as classifiers, normalizers, > poolers, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5575) Artificial neural networks for MLlib deep learning
[ https://issues.apache.org/jira/browse/SPARK-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984809#comment-14984809 ] Guoqiang Li edited comment on SPARK-5575 at 11/2/15 6:59 AM: - Hi All, There is a MLP implementation with the Parameter Server and Spark. [MLP.scala|https://github.com/witgo/zen/blob/ps_mlp/ml/src/main/scala/com/github/cloudml/zen/ml/parameterserver/neuralNetwork/MLP.scala] was (Author: gq): Hi All, There is a MLP implementation with the Parameter Server and Spark. > Artificial neural networks for MLlib deep learning > -- > > Key: SPARK-5575 > URL: https://issues.apache.org/jira/browse/SPARK-5575 > Project: Spark > Issue Type: Umbrella > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Alexander Ulanov > > Goal: Implement various types of artificial neural networks > Motivation: deep learning trend > Requirements: > 1) Basic abstractions such as Neuron, Layer, Error, Regularization, Forward > and Backpropagation etc. should be implemented as traits or interfaces, so > they can be easily extended or reused > 2) Implement complex abstractions, such as feed forward and recurrent networks > 3) Implement multilayer perceptron (MLP), convolutional networks (LeNet), > autoencoder (sparse and denoising), stacked autoencoder, restricted > boltzmann machines (RBM), deep belief networks (DBN) etc. > 4) Implement or reuse supporting constucts, such as classifiers, normalizers, > poolers, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5575) Artificial neural networks for MLlib deep learning
[ https://issues.apache.org/jira/browse/SPARK-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984809#comment-14984809 ] Guoqiang Li commented on SPARK-5575: Hi All, There is a MLP implementation with the Parameter Server and Spark. > Artificial neural networks for MLlib deep learning > -- > > Key: SPARK-5575 > URL: https://issues.apache.org/jira/browse/SPARK-5575 > Project: Spark > Issue Type: Umbrella > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Alexander Ulanov > > Goal: Implement various types of artificial neural networks > Motivation: deep learning trend > Requirements: > 1) Basic abstractions such as Neuron, Layer, Error, Regularization, Forward > and Backpropagation etc. should be implemented as traits or interfaces, so > they can be easily extended or reused > 2) Implement complex abstractions, such as feed forward and recurrent networks > 3) Implement multilayer perceptron (MLP), convolutional networks (LeNet), > autoencoder (sparse and denoising), stacked autoencoder, restricted > boltzmann machines (RBM), deep belief networks (DBN) etc. > 4) Implement or reuse supporting constucts, such as classifiers, normalizers, > poolers, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10350) Fix SQL Programming Guide
Guoqiang Li created SPARK-10350: --- Summary: Fix SQL Programming Guide Key: SPARK-10350 URL: https://issues.apache.org/jira/browse/SPARK-10350 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Guoqiang Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: {{spark.sql.parquet.mergeSchema}} (was: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: [[spark.sql.parquet.mergeSchema]]) Fix SQL Programming Guide - Key: SPARK-10350 URL: https://issues.apache.org/jira/browse/SPARK-10350 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Guoqiang Li Priority: Minor [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: {{spark.sql.parquet.mergeSchema}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: [[spark.sql.parquet.mergeSchema]] Fix SQL Programming Guide - Key: SPARK-10350 URL: https://issues.apache.org/jira/browse/SPARK-10350 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Guoqiang Li Priority: Minor [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: [[spark.sql.parquet.mergeSchema]] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10350) Fix SQL Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-10350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-10350: Description: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95#diff-d8aa7a37d17a1227cba38c99f9f22511R1383] contains duplicate content: {{spark.sql.parquet.mergeSchema}} (was: [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95] contains duplicate content: {{spark.sql.parquet.mergeSchema}}) Fix SQL Programming Guide - Key: SPARK-10350 URL: https://issues.apache.org/jira/browse/SPARK-10350 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.5.0 Reporter: Guoqiang Li Priority: Minor [b93d99a|https://github.com/apache/spark/commit/b93d99ae21b8b3af1dd55775f77e5a9ddea48f95#diff-d8aa7a37d17a1227cba38c99f9f22511R1383] contains duplicate content: {{spark.sql.parquet.mergeSchema}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8276) NPE in YarnClientSchedulerBackend.stop
[ https://issues.apache.org/jira/browse/SPARK-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582185#comment-14582185 ] Guoqiang Li commented on SPARK-8276: Can you give more error log of this issue ? NPE in YarnClientSchedulerBackend.stop -- Key: SPARK-8276 URL: https://issues.apache.org/jira/browse/SPARK-8276 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.5.0 Reporter: Steve Loughran Priority: Minor NPE seen in {{YarnClientSchedulerBackend.stop()}} after problem setting up job; on the line {{monitorThread.interrupt()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8273) Driver hangs up when yarn shutdown in client mode
[ https://issues.apache.org/jira/browse/SPARK-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-8273: --- Affects Version/s: 1.4.0 1.3.1 Driver hangs up when yarn shutdown in client mode - Key: SPARK-8273 URL: https://issues.apache.org/jira/browse/SPARK-8273 Project: Spark Issue Type: Bug Components: Spark Core, YARN Affects Versions: 1.3.1, 1.4.0 Reporter: Tao Wang In client mode, if yarn was shut down with spark application running, the application will hang up after several retries(default: 30) because the exception throwed by YarnClientImpl could not be caught by upper level, we should exit in case that user can not be aware that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8273) Driver hangs up when yarn shutdown in client mode
[ https://issues.apache.org/jira/browse/SPARK-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-8273: --- Component/s: YARN Driver hangs up when yarn shutdown in client mode - Key: SPARK-8273 URL: https://issues.apache.org/jira/browse/SPARK-8273 Project: Spark Issue Type: Bug Components: Spark Core, YARN Reporter: Tao Wang In client mode, if yarn was shut down with spark application running, the application will hang up after several retries(default: 30) because the exception throwed by YarnClientImpl could not be caught by upper level, we should exit in case that user can not be aware that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7934: --- Description: The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{ -XX:ConcGCThreadss=5}} is wrong. The logs: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar 15/05/29 10:20:39 INFO Client: Uploading resource hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/conf 15/05/29 10:20:41 INFO Client: Setting up the launch environment for our AM container 15/05/29 10:20:42 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:42 INFO SecurityManager: Changing modify acls to: spark 15/05/29
[jira] [Updated] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7934: --- Description: The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{ -XX:ConcGCThreadss=5}} is wrong. The logs: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar 15/05/29 10:20:39 INFO Client: Uploading resource hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/conf 15/05/29 10:20:41 INFO Client: Setting up the launch environment for our AM container 15/05/29 10:20:42 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:42 INFO SecurityManager: Changing modify acls to: spark 15/05/29
[jira] [Commented] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564386#comment-14564386 ] Guoqiang Li commented on SPARK-7934: OK, the information is a little less, but I can only get these. If I can reproduce the bug in test cluster, I will reopen this jira. In some cases, Spark hangs in yarn-client mode. --- Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The log: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
[jira] [Updated] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7934: --- Description: The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{-XX:ConcGCThreadss=5}} is wrong. {{HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ YARN_CONF_DIR= ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-library-path $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar}} = {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar 15/05/29 10:20:39 INFO Client: Uploading resource hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf -
[jira] [Commented] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564409#comment-14564409 ] Guoqiang Li commented on SPARK-7934: cc [~srowen] In some cases, Spark hangs in yarn-client mode. --- Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{-XX:ConcGCThreadss=5}} is wrong. {{HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ YARN_CONF_DIR= ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-library-path $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar}} = {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36
[jira] [Reopened] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li reopened SPARK-7934: In some cases, Spark hangs in yarn-client mode. --- Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{ -XX:ConcGCThreadss=5}} is wrong. The logs: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
[jira] [Commented] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
[ https://issues.apache.org/jira/browse/SPARK-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564562#comment-14564562 ] Guoqiang Li commented on SPARK-7934: [~srowen] I want to say that some reasons cause executor process cannot start, spark should not hang. In some cases, Spark hangs in yarn-client mode. --- Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The conf/spark-defaults.conf {noformat} spark.executor.extraJavaOptions -XX:+UseG1GC -XX:ConcGCThreadss=5 -XX:MaxPermSize=512m -Xss2m {noformat} Note: {{-XX:ConcGCThreadss=5}} is wrong. {{HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ YARN_CONF_DIR= ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-library-path $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar}} = {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29
[jira] [Created] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
Guoqiang Li created SPARK-7934: -- Summary: In some cases, Spark hangs in yarn-client mode. Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The log: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar 15/05/29 10:20:39 INFO Client: Uploading resource hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf - hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/conf 15/05/29 10:20:41 INFO Client: Setting up the launch environment for our AM container 15/05/29 10:20:42 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:42 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:42 INFO
[jira] [Commented] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530029#comment-14530029 ] Guoqiang Li commented on SPARK-5556: [FastLDA|https://github.com/witgo/zen/blob/1c0f6c63a0b67569aeefba3f767acf1ac93c7a7c/ml/src/main/scala/com/github/cloudml/zen/ml/clustering/LDA.scala#L553]: Gibbs sampling,The computational complexity is O(n_dk), n_dk is the number of topic (unique) in document d. I recommend to be used for short text [LightLDA|https://github.com/witgo/zen/blob/1c0f6c63a0b67569aeefba3f767acf1ac93c7a7c/ml/src/main/scala/com/github/cloudml/zen/ml/clustering/LDA.scala#L763] Metropolis Hasting sampling The computational complexity is O(1)(It depends on the partition strategy and takes up more memory). Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx, spark-summit.pptx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li reopened SPARK-7008: This jira should not be closed.. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518602#comment-14518602 ] Guoqiang Li commented on SPARK-5556: I put the latest LDA code in [Zen|https://github.com/witgo/zen/tree/master/ml/src/main/scala/com/github/cloudml/zen/ml/clustering] The test results [here|https://issues.apache.org/jira/secure/attachment/12729030/LDA_test.xlsx] (72 cores, 216G ram, 6 servers, Gigabit Ethernet) Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5556: --- Attachment: LDA_test.xlsx Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5556: --- Attachment: spark-summit.pptx Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx, spark-summit.pptx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518618#comment-14518618 ] Guoqiang Li commented on SPARK-5556: LDA_Gibbs combines the advantages of AliasLDA, FastLDA and SparseLDA algorithm. The corresponding code is https://github.com/witgo/spark/tree/lda_Gibbs or https://github.com/witgo/zen/blob/master/ml/src/main/scala/com/github/cloudml/zen/ml/clustering/LDA.scala#L553. Yes LightLDA converge faster,but it takes up more memory Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx, spark-summit.pptx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5556) Latent Dirichlet Allocation (LDA) using Gibbs sampler
[ https://issues.apache.org/jira/browse/SPARK-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518621#comment-14518621 ] Guoqiang Li commented on SPARK-5556: [spark-summit.pptx|https://issues.apache.org/jira/secure/attachment/12729035/spark-summit.pptx] has introduced the relevant algorithm Latent Dirichlet Allocation (LDA) using Gibbs sampler -- Key: SPARK-5556 URL: https://issues.apache.org/jira/browse/SPARK-5556 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Guoqiang Li Assignee: Pedro Rodriguez Attachments: LDA_test.xlsx, spark-summit.pptx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7162) spark-shell launcher failure in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7162: --- Description: {code:none} HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-libraryath $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar {code} = {code:none} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:973) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106) at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) at
[jira] [Updated] (SPARK-7162) spark-shell launcher failure in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7162: --- Description: {code:none} HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-libraryath $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar {code} = {code:none} _LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar,/opt/spark/classes/toona-assembly.jar Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:973) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106) at
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513856#comment-14513856 ] Guoqiang Li commented on SPARK-7008: [~mengxr] what's your view for [~podongfeng] said? An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7162) Launcher error in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7162: --- Summary: Launcher error in yarn-client (was: spark-shell launcher error in yarn-client) Launcher error in yarn-client - Key: SPARK-7162 URL: https://issues.apache.org/jira/browse/SPARK-7162 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.4.0 Reporter: Guoqiang Li {code:none} HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-libraryath $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar {code} = {code:none} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at
[jira] [Updated] (SPARK-7162) spark-shell launcher failure in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7162: --- Description: {code:none} HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-libraryath $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar,/opt/spark/classes/toona-assembly.jar {code} = {code:none} _LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar,/opt/spark/classes/toona-assembly.jar Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:973) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106) at
[jira] [Created] (SPARK-7162) spark-shell launcher failure in yarn-client
Guoqiang Li created SPARK-7162: -- Summary: spark-shell launcher failure in yarn-client Key: SPARK-7162 URL: https://issues.apache.org/jira/browse/SPARK-7162 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.4.0 Reporter: Guoqiang Li The log: {code:none} _LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar,/opt/spark/classes/toona-assembly.jar Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:973) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106) at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) at
[jira] [Updated] (SPARK-7162) spark-shell launcher error in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7162: --- Summary: spark-shell launcher error in yarn-client (was: spark-shell launcher failure in yarn-client) spark-shell launcher error in yarn-client -- Key: SPARK-7162 URL: https://issues.apache.org/jira/browse/SPARK-7162 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.4.0 Reporter: Guoqiang Li {code:none} HADOOP_CONF_DIR=/usr/local/CDH5/hadoop-2.3.0-cdh5.0.1/etc/hadoop/ ./bin/spark-shell --master yarn-client --driver-memory 8g --driver-libraryath $LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH --jars lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar {code} = {code:none} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. spark.yarn.driver.memoryOverhead is set but does not apply in client mode. java.io.FileNotFoundException: /etc/hadoop/hadoop (是一个目录) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124) at org.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:182) at org.spark-project.guava.io.Files.copy(Files.java:417) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:374) at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$2.apply(Client.scala:372) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:372) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:288) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:466) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:106) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.init(SparkContext.scala:469) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1016) at $iwC$$iwC.init(console:9) at $iwC.init(console:18) at init(console:20) at .init(console:24) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512238#comment-14512238 ] Guoqiang Li commented on SPARK-7008: In practice, relative to the {{LBFGS}} ,{{SGD +AdaGrad}} converges faster and better An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504435#comment-14504435 ] Guoqiang Li edited comment on SPARK-7008 at 4/25/15 4:16 AM: - This is my test chart on convergence rate(Binary classification). The convergence curves of Binary Classification are ploted in attached [FM_convergence_rate.xlsx|https://issues.apache.org/jira/secure/attachment/12726809/FM_convergence_rate.xlsx] url: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/url_combined.bz2 kdda: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/kdda.bz2 was (Author: gq): This is my test chart on convergence rate(Binary classification). url: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/url_combined.bz2 kdda: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/kdda.bz2 An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7008) An Implement of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-7008: --- Attachment: FM_convergence_rate.xlsx QQ20150421-1.png QQ20150421-2.png This is my test chart on convergence rate(Binary classification). url: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/url_combined.bz2 kdda: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/kdda.bz2 An Implement of Factorization Machine (LibFM) - Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implement of Factorization Machines based on Scala and Spark MLlib. Factorization Machine is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. Factorization Machines works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7008) Implement of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502616#comment-14502616 ] Guoqiang Li edited comment on SPARK-7008 at 4/20/15 10:34 AM: -- Here's a graphx-based implementation(WIP): https://github.com/witgo/zen/tree/FactorizationMachine was (Author: gq): Here's a graphx-based implementation: https://github.com/witgo/zen/tree/FactorizationMachine Implement of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch An implementation of Factorization Machines based on Scala and Spark MLlib. Factorization Machine is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. Factorization Machines works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6932) A Prototype of Parameter Server
[ https://issues.apache.org/jira/browse/SPARK-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502332#comment-14502332 ] Guoqiang Li commented on SPARK-6932: I have three questions: 1. {code:none} def update[T](key: String, delta: T, reduceFunc: (T, T) = T): Unit{code} If the reduceFunc rely on other resources(parameters) how should wo do (eg: Sparse Group Restricted Boltzmann Machines, AdaGrad)? 2. The String does not affect the performance? 3. How to get only part of the parameters (using sample sparsity accelerated computing)? A Prototype of Parameter Server --- Key: SPARK-6932 URL: https://issues.apache.org/jira/browse/SPARK-6932 Project: Spark Issue Type: New Feature Components: ML, MLlib, Spark Core Reporter: Qiping Li h2. Introduction As specified in [SPARK-4590|https://issues.apache.org/jira/browse/SPARK-4590],it would be very helpful to integrate parameter server into Spark for machine learning algorithms, especially for those with ultra high dimensions features. After carefully studying the design doc of [Parameter Servers|https://docs.google.com/document/d/1SX3nkmF41wFXAAIr9BgqvrHSS5mW362fJ7roBXJm06o/edit?usp=sharing],and the paper of [Factorbird|http://stanford.edu/~rezab/papers/factorbird.pdf], we proposed a prototype of Parameter Server on Spark(Ps-on-Spark), with several key design concerns: * *User friendly interface* Careful investigation is done to most existing Parameter Server systems(including: [petuum|http://petuum.github.io], [parameter server|http://parameterserver.org], [paracel|https://github.com/douban/paracel]) and a user friendly interface is design by absorbing essence from all these system. * *Prototype of distributed array* IndexRDD (see [SPARK-4590|https://issues.apache.org/jira/browse/SPARK-4590]) doesn't seem to be a good option for distributed array, because in most case, the #key updates/second is not be very high. So we implement a distributed HashMap to store the parameters, which can be easily extended to get better performance. * *Minimal code change* Quite a lot of effort in done to avoid code change of Spark core. Tasks which need parameter server are still created and scheduled by Spark's scheduler. Tasks communicate with parameter server with a client object, through *akka* or *netty*. With all these concerns we propose the following architecture: h2. Architecture !https://cloud.githubusercontent.com/assets/1285855/7158179/f2d25cc4-e3a9-11e4-835e-89681596c478.jpg! Data is stored in RDD and is partitioned across workers. During each iteration, each worker gets parameters from parameter server then computes new parameters based on old parameters and data in the partition. Finally each worker updates parameters to parameter server.Worker communicates with parameter server through a parameter server client,which is initialized in `TaskContext` of this worker. The current implementation is based on YARN cluster mode, but it should not be a problem to transplanted it to other modes. h3. Interface We refer to existing parameter server systems(petuum, parameter server, paracel) when design the interface of parameter server. *`PSClient` provides the following interface for workers to use:* {code} // get parameter indexed by key from parameter server def get[T](key: String): T // get multiple parameters from parameter server def multiGet[T](keys: Array[String]): Array[T] // add parameter indexed by `key` by `delta`, // if multiple `delta` to update on the same parameter, // use `reduceFunc` to reduce these `delta`s frist. def update[T](key: String, delta: T, reduceFunc: (T, T) = T): Unit // update multiple parameters at the same time, use the same `reduceFunc`. def multiUpdate(keys: Array[String], delta: Array[T], reduceFunc: (T, T) = T: Unit // advance clock to indicate that current iteration is finished. def clock(): Unit // block until all workers have reached this line of code. def sync(): Unit {code} *`PSContext` provides following functions to use on driver:* {code} // load parameters from existing rdd. def loadPSModel[T](model: RDD[String, T]) // fetch parameters from parameter server to construct model. def fetchPSModel[T](keys: Array[String]): Array[T] {code} *A new function has been add to `RDD` to run parameter server tasks:* {code} // run the provided `func` on each partition of this RDD. // This function can use data of this partition(the first argument) // and a parameter server client(the second argument). // See the following Logistic Regression for an example. def runWithPS[U: ClassTag](func: (Array[T], PSClient) = U): Array[U] {code} h2. Example Here is an example of using our
[jira] [Commented] (SPARK-3937) Unsafe memory access inside of Snappy library
[ https://issues.apache.org/jira/browse/SPARK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498189#comment-14498189 ] Guoqiang Li commented on SPARK-3937: [~joshrosen] This bug occurs every time. I'm not sure whether I can use the local-cluster to reproduce the bug. If it is successful, I'll post the code. Unsafe memory access inside of Snappy library - Key: SPARK-3937 URL: https://issues.apache.org/jira/browse/SPARK-3937 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.3.0 Reporter: Patrick Wendell This was observed on master between Spark 1.1 and 1.2. Unfortunately I don't have much information about this other than the stack trace. However, it was concerning enough I figured I should post it. {code} java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code org.xerial.snappy.SnappyNative.rawUncompress(Native Method) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444) org.xerial.snappy.Snappy.uncompress(Snappy.java:480) org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:355) org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310) java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2712) java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1687) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) scala.collection.AbstractIterator.toArray(Iterator.scala:1157) org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:140) org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:140) org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:1118) org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:1118) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-6849) The constructor of GradientDescent should be public
[ https://issues.apache.org/jira/browse/SPARK-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491861#comment-14491861 ] Guoqiang Li commented on SPARK-6849: [~srowen] https://github.com/cloudml/zen The constructor of GradientDescent should be public --- Key: SPARK-6849 URL: https://issues.apache.org/jira/browse/SPARK-6849 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.1 Reporter: Guoqiang Li Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3937) Unsafe memory access inside of Snappy library
[ https://issues.apache.org/jira/browse/SPARK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491857#comment-14491857 ] Guoqiang Li commented on SPARK-3937: Get data: {code:none}wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/kdda.bz2{code} Get code: {code:none}git clone https://github.com/cloudml/zen.git{code} mvn -DskipTests clean package spark-defaults.conf: {code:none} spark.yarn.dist.archives hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf spark.yarn.user.classpath.first true spark.cleaner.referenceTracking.blocking true spark.cleaner.referenceTracking.cleanCheckpoints true spark.cleaner.referenceTracking.blocking.shuffle true spark.yarn.historyServer.address 10dian71:18080 spark.executor.cores 2 spark.yarn.executor.memoryOverhead 1 spark.yarn.driver.memoryOverhead 1 spark.executor.instances 36 spark.rdd.compress true spark.executor.memory 4g spark.akka.frameSize 20 spark.akka.askTimeout120 spark.akka.timeout 120 spark.default.parallelism72 spark.locality.wait 1 spark.core.connection.ack.wait.timeout 360 spark.storage.memoryFraction 0.1 spark.broadcast.factory org.apache.spark.broadcast.TorrentBroadcastFactory spark.driver.maxResultSize 4000 #spark.shuffle.blockTransferService nio #spark.akka.heartbeat.interval 100 #spark.kryoserializer.buffer.max.mb 128 spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryo.registrator org.apache.spark.graphx.GraphKryoRegistrator #spark.kryo.registrator com.github.cloudml.zen.ml.clustering.LDAKryoRegistrator {code} Reproduce: {code:none}./bin/spark-shell --master yarn-client --driver-memory 8g --jars /opt/spark/classes/zen-assembly.jar{code} {code:none} import com.github.cloudml.zen.ml.regression.LogisticRegression import org.apache.spark.mllib.util.MLUtils import org.apache.spark.mllib.regression.LabeledPoint val dataSet = MLUtils.loadLibSVMFile(sc, /input/lbs/recommend/kdda/*).repartition(72).cache() val numIterations = 150 val stepSize = 0.1 val l1 = 0.0 val epsilon = 1e-6 val useAdaGrad = false LogisticRegression.trainMIS(dataSet, numIterations, stepSize, l1, epsilon, useAdaGrad) {code} Unsafe memory access inside of Snappy library - Key: SPARK-3937 URL: https://issues.apache.org/jira/browse/SPARK-3937 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.3.0 Reporter: Patrick Wendell This was observed on master between Spark 1.1 and 1.2. Unfortunately I don't have much information about this other than the stack trace. However, it was concerning enough I figured I should post it. {code} java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code org.xerial.snappy.SnappyNative.rawUncompress(Native Method) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444) org.xerial.snappy.Snappy.uncompress(Snappy.java:480) org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:355) org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310) java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2712) java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1687) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
[jira] [Commented] (SPARK-6849) The constructor of GradientDescent should be public
[ https://issues.apache.org/jira/browse/SPARK-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490642#comment-14490642 ] Guoqiang Li commented on SPARK-6849: [~srowen] This should be a bug. The GradientDescent cannot be created outside of MLlib. This causes the user can not use GradientDescent in their code. The constructor of GradientDescent should be public --- Key: SPARK-6849 URL: https://issues.apache.org/jira/browse/SPARK-6849 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.1 Reporter: Guoqiang Li Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3937) Unsafe memory access inside of Snappy library
[ https://issues.apache.org/jira/browse/SPARK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486810#comment-14486810 ] Guoqiang Li commented on SPARK-3937: The bug seems to be caused by {{spark.storage.memoryFraction 0.2}}. {{spark.storage.memoryFraction 0.4}} won't appear the bug. These may be related with the size of the RDD. Unsafe memory access inside of Snappy library - Key: SPARK-3937 URL: https://issues.apache.org/jira/browse/SPARK-3937 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Patrick Wendell This was observed on master between Spark 1.1 and 1.2. Unfortunately I don't have much information about this other than the stack trace. However, it was concerning enough I figured I should post it. {code} java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code org.xerial.snappy.SnappyNative.rawUncompress(Native Method) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444) org.xerial.snappy.Snappy.uncompress(Snappy.java:480) org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:355) org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310) java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2712) java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1687) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) scala.collection.AbstractIterator.toArray(Iterator.scala:1157) org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:140) org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:140) org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:1118) org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:1118) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-3937) Unsafe memory access inside of Snappy library
[ https://issues.apache.org/jira/browse/SPARK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486763#comment-14486763 ] Guoqiang Li commented on SPARK-3937: I have encountered this issues in spark 1.3 . My configuration spark-defaults.conf is : {code:none} spark.akka.frameSize 20 spark.akka.askTimeout120 spark.akka.timeout 120 spark.default.parallelism72 spark.locality.wait 1 spark.storage.blockManagerTimeoutIntervalMs 600 #spark.yarn.max.executor.failures 100 spark.core.connection.ack.wait.timeout 360 spark.storage.memoryFraction 0.2 spark.broadcast.factory org.apache.spark.broadcast.TorrentBroadcastFactory #spark.broadcast.blockSize 8192 spark.driver.maxResultSize 4000 #spark.shuffle.blockTransferService nio #spark.akka.heartbeat.interval 100 spark.kryoserializer.buffer.max.mb 256 spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryo.registrator org.apache.spark.graphx.GraphKryoRegistrator #spark.kryo.registrator org.apache.spark.mllib.clustering.LDAKryoRegistrator {code} {code:none} java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594) at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358) at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167) at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150) at com.esotericsoftware.kryo.io.Input.fill(Input.java:140) at com.esotericsoftware.kryo.io.Input.require(Input.java:169) at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:202) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} Unsafe memory access inside of Snappy library - Key: SPARK-3937 URL: https://issues.apache.org/jira/browse/SPARK-3937 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Patrick Wendell This was observed on master between Spark 1.1 and 1.2. Unfortunately I don't have much information about this other than the stack trace. However, it was concerning enough I figured I should post it. {code} java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code org.xerial.snappy.SnappyNative.rawUncompress(Native Method) org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444) org.xerial.snappy.Snappy.uncompress(Snappy.java:480) org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:355) org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310) java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2712) java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742) java.io.ObjectInputStream.readArray(ObjectInputStream.java:1687) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
[jira] [Updated] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5261: --- Description: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 was: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5)
[jira] [Updated] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5261: --- Description: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = 0.13889 {code} was: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = 0.13889 {code} In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 /
[jira] [Commented] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481378#comment-14481378 ] Guoqiang Li commented on SPARK-5261: I'm sorry, the after one 's mincount is 100 In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5261: --- Description: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = 0.13889 {code} was: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(100) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text)
[jira] [Updated] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-5261: --- Description: Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 was: {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36) {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li Get data: {code:none} normalize_text() { awk '{print tolower($0);}' | sed -e s/’/'/g -e s/′/'/g -e s/''/ /g -e s/'/ ' /g -e s/“/\/g -e s/”/\/g \ -e 's// /g' -e 's/\./ \. /g' -e 's/br \// /g' -e 's/, / , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \ -e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' -e 's/-/ - /g' -e 's/=/ /g' -e 's/=/ /g' -e 's/*/ /g' -e 's/|/ /g' \ -e 's/«/ /g' | tr 0-9 } wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz gzip -d news.2013.en.shuffled.gz normalize_text news.2013.en.shuffled data.txt {code} {code:none} import org.apache.spark.mllib.feature.Word2Vec val text = sc.textFile(dataPath).map { t = t.split( ).toIterable } val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res1: Float = 375059.84 val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36). setMinCount(5) val model = word2Vec.fit(text) model.getVectors.map { t = t._2.map(_.abs).sum }.sum / 100 / model.getVectors.size = res3: Float = 1661285.2 {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395613#comment-14395613 ] Guoqiang Li commented on SPARK-3625: Sometimes, when calling the RDD.checkpoint , we cannot determine it before any job has been executed on this RDD. Just like [PeriodicGraphCheckpointer|https://github.com/apache/spark/blob/branch-1.3/mllib/src/main/scala/org/apache/spark/mllib/impl/PeriodicGraphCheckpointer.scala] In some cases, the RDD.checkpoint does not work --- Key: SPARK-3625 URL: https://issues.apache.org/jira/browse/SPARK-3625 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Guoqiang Li Assignee: Guoqiang Li The reproduce code: {code} sc.setCheckpointDir(checkpointDir) val c = sc.parallelize((1 to 1000)).map(_ + 1) c.count val dep = c.dependencies.head.rdd c.checkpoint() c.count assert(dep != c.dependencies.head.rdd) {code} This limit is too strict , This makes it difficult to implement SPARK-3623 . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-5261) In some cases ,The value of word's vector representation is too big
[ https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li reopened SPARK-5261: [~srowen] SPARK-5261 andSPARK-4846 are not the same problem. This is a algorithm error. The resulting vector is incorrect. In some cases ,The value of word's vector representation is too big --- Key: SPARK-5261 URL: https://issues.apache.org/jira/browse/SPARK-5261 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Reporter: Guoqiang Li {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(36) {code} The average absolute value of the word's vector representation is 60731.8 {code} val word2Vec = new Word2Vec() word2Vec. setVectorSize(100). setSeed(42L). setNumIterations(5). setNumPartitions(1) {code} The average absolute value of the word's vector representation is 0.13889 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6697) PeriodicGraphCheckpointer is not clear Edges.
Guoqiang Li created SPARK-6697: -- Summary: PeriodicGraphCheckpointer is not clear Edges. Key: SPARK-6697 URL: https://issues.apache.org/jira/browse/SPARK-6697 Project: Spark Issue Type: Bug Components: GraphX, MLlib Affects Versions: 1.3.0 Reporter: Guoqiang Li When I run this [branch(lrGraphxSGD)| https://github.com/witgo/spark/tree/lrGraphxSGD] . PeriodicGraphCheckpointer only clear the vertices. {code} def run(iterations: Int): Unit = { for (iter - 1 to iterations) { logInfo(sStart train (Iteration $iter/$iterations)) val margin = forward() margin.setName(smargin-$iter).persist(storageLevel) println(strain (Iteration $iter/$iterations) cost : ${error(margin)}) var gradient = backward(margin) gradient = updateDeltaSum(gradient, iter) dataSet = updateWeight(gradient, iter) dataSet.vertices.setName(svertices-$iter) dataSet.edges.setName(sedges-$iter) dataSet.persist(storageLevel) graphCheckpointer.updateGraph(dataSet) margin.unpersist(blocking = false) gradient.unpersist(blocking = false) logInfo(sEnd train (Iteration $iter/$iterations)) innerIter += 1 } graphCheckpointer.deleteAllCheckpoints() } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395492#comment-14395492 ] Guoqiang Li commented on SPARK-3625: When we run the machine learning and graph algorithms, this feature is very necessary, I think we should merge this PR 2480 to master . In some cases, the RDD.checkpoint does not work --- Key: SPARK-3625 URL: https://issues.apache.org/jira/browse/SPARK-3625 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Guoqiang Li Assignee: Guoqiang Li The reproduce code: {code} sc.setCheckpointDir(checkpointDir) val c = sc.parallelize((1 to 1000)).map(_ + 1) c.count val dep = c.dependencies.head.rdd c.checkpoint() c.count assert(dep != c.dependencies.head.rdd) {code} This limit is too strict , This makes it difficult to implement SPARK-3623 . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6697) PeriodicGraphCheckpointer is not clear Edges.
[ https://issues.apache.org/jira/browse/SPARK-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-6697: --- Attachment: QQ20150403-1.png Web UI Screenshot PeriodicGraphCheckpointer is not clear Edges. - Key: SPARK-6697 URL: https://issues.apache.org/jira/browse/SPARK-6697 Project: Spark Issue Type: Bug Components: GraphX, MLlib Affects Versions: 1.3.0 Reporter: Guoqiang Li Attachments: QQ20150403-1.png When I run this [branch(lrGraphxSGD)| https://github.com/witgo/spark/tree/lrGraphxSGD] . PeriodicGraphCheckpointer only clear the vertices. {code} def run(iterations: Int): Unit = { for (iter - 1 to iterations) { logInfo(sStart train (Iteration $iter/$iterations)) val margin = forward() margin.setName(smargin-$iter).persist(storageLevel) println(strain (Iteration $iter/$iterations) cost : ${error(margin)}) var gradient = backward(margin) gradient = updateDeltaSum(gradient, iter) dataSet = updateWeight(gradient, iter) dataSet.vertices.setName(svertices-$iter) dataSet.edges.setName(sedges-$iter) dataSet.persist(storageLevel) graphCheckpointer.updateGraph(dataSet) margin.unpersist(blocking = false) gradient.unpersist(blocking = false) logInfo(sEnd train (Iteration $iter/$iterations)) innerIter += 1 } graphCheckpointer.deleteAllCheckpoints() } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org