[ 
https://issues.apache.org/jira/browse/CASSANDRA-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318728#comment-14318728
 ] 

Erik Forsberg commented on CASSANDRA-8774:
------------------------------------------

I'm not the author of the original code, but when adding some debug printouts 
showing output of getCause(), I also saw some hadoop-related ExecutorException 
(something about interrupted sending of progress or something similar, don't 
have the details here). So I think there might be other reasons the code looks 
the way it does.

It's also possible that StreamingException should be caught somewhere else with 
the future just returning. The check after the while loop takes care of errors 
even if there's no StreamException, i.e. if the future returns successfully.

I get this feeling this bug probably sneaked in during the streaming rewrite in 
2.0. 

All the above to be taken as random ramblings by someone who doesn't fully 
understand everything in the Cassandra codebase. 

> BulkOutputFormat never completes if streaming have errors
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8774
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8774
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Erik Forsberg
>             Fix For: 2.0.13
>
>         Attachments: 
> 0001-CASSANDRA-8774-Handle-StreamException-when-bulkloadi.patch
>
>
> With BulkoutputFormat in Cassandra 1.2.18, if any streaming errors occured, 
> the hadoop task would fail. This doesn't seem to happen with 2.0.12.
> I have a hadoop map task that use BulkoutputFormat. If one of the cassandra 
> nodes I'm writing to is down, I'm getting the following syslog output from 
> the map task:
> {noformat}
> 2015-02-10 10:54:15,162 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
> the native-hadoop library
> 2015-02-10 10:54:15,601 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2015-02-10 10:54:15,901 INFO org.apache.hadoop.util.ProcessTree: setsid 
> exited with exit code 0
> 2015-02-10 10:54:15,907 INFO org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4984451e
> 2015-02-10 10:54:16,110 INFO org.apache.hadoop.mapred.MapTask: Processing 
> split: 
> hdfs://hdpmt01.osp-hadoop.osa:9000/user/jenkins/syst/5ef13_osp/tvstore/sumcombinations/hourly/2015021002/per_period-5ba2faa4b11111e4aa21fa163e82bc46-sumcombinations/0/data/part-00047:0+462
> 2015-02-10 10:54:16,739 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: 
> Successfully loaded & initialized native-zlib library
> 2015-02-10 10:54:16,740 INFO org.apache.hadoop.io.compress.CodecPool: Got 
> brand-new decompressor
> 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
> brand-new decompressor
> 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
> brand-new decompressor
> 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got 
> brand-new decompressor
> 2015-02-10 10:54:16,927 ERROR org.apache.cassandra.cql3.QueryProcessor: 
> Unable to initialize MemoryMeter (jamm not specified as javaagent).  This 
> means Cassandra will be unable to measure object sizes accurately and may 
> consequently OOM.
> 2015-02-10 10:54:17,780 INFO org.apache.cassandra.utils.CLibrary: JNA not 
> found. Native methods will be disabled.
> 2015-02-10 10:54:19,446 INFO org.apache.cassandra.io.sstable.SSTableReader: 
> Opening 
> /opera/log1/hadoop/mapred/local/taskTracker/jenkins/jobcache/job_201502041226_13903/attempt_201502041226_13903_m_000000_0/work/tmp/syst5ef13osp/Data_hourly/syst5ef13osp-Data_hourly-jb-1
>  (1018 bytes)
> 2015-02-10 10:54:20,713 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Executing streaming plan for Bulk Load
> 2015-02-10 10:54:20,713 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:7
> 2015-02-10 10:54:20,714 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:8
> 2015-02-10 10:54:20,715 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:7
> 2015-02-10 10:54:20,730 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:4
> 2015-02-10 10:54:20,750 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:3
> 2015-02-10 10:54:20,731 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:8
> 2015-02-10 10:54:20,750 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:4
> 2015-02-10 10:54:20,770 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:6
> 2015-02-10 10:54:20,778 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with 
> /ipv6:prefix:1:441:0:0:0:5
> 2015-02-10 10:54:20,786 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:3
> 2015-02-10 10:54:20,790 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:6
> 2015-02-10 10:54:20,867 INFO org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to 
> /ipv6:prefix:1:441:0:0:0:5
> 2015-02-10 10:54:20,897 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Prepare completed. Receiving 0 files(0 
> bytes), sending 1 files(152 bytes)
> 2015-02-10 10:54:20,897 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Prepare completed. Receiving 0 files(0 
> bytes), sending 1 files(491 bytes)
> 2015-02-10 10:54:20,897 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Prepare completed. Receiving 0 files(0 
> bytes), sending 1 files(375 bytes)
> 2015-02-10 10:54:20,897 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Prepare completed. Receiving 0 files(0 
> bytes), sending 1 files(322 bytes)
> 2015-02-10 10:54:20,898 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Prepare completed. Receiving 0 files(0 
> bytes), sending 1 files(169 bytes)
> 2015-02-10 10:54:20,983 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:7 is complete
> 2015-02-10 10:54:20,984 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:8 is complete
> 2015-02-10 10:54:20,985 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:3 is complete
> 2015-02-10 10:54:20,997 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:4 is complete
> 2015-02-10 10:54:21,067 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:6 is complete
> 2015-02-10 10:55:24,027 WARN 
> org.apache.cassandra.streaming.DefaultConnectionFactory: Failed attempt 1 to 
> connect to /ipv6:prefix:1:441:0:0:0:5. Retrying in 20000 ms. 
> (java.net.ConnectException: Connection timed out)
> 2015-02-10 10:56:54,344 WARN 
> org.apache.cassandra.streaming.DefaultConnectionFactory: Failed attempt 2 to 
> connect to /ipv6:prefix:1:441:0:0:0:5. Retrying in 40000 ms. 
> (java.net.ConnectException: Connection timed out)
> 2015-02-10 10:58:37,493 ERROR org.apache.cassandra.streaming.StreamSession: 
> [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Streaming error occurred
> java.net.ConnectException: Connection timed out
>       at sun.nio.ch.Net.connect0(Native Method)
>       at sun.nio.ch.Net.connect(Net.java:465)
>       at sun.nio.ch.Net.connect(Net.java:457)
>       at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
>       at java.nio.channels.SocketChannel.open(SocketChannel.java:184)
>       at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:134)
>       at 
> org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:52)
>       at 
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:235)
>       at 
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:77)
>       at 
> org.apache.cassandra.streaming.StreamSession$1.run(StreamSession.java:221)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-02-10 10:58:37,502 INFO 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Session with 
> /ipv6:prefix:1:441:0:0:0:5 is complete
> 2015-02-10 10:58:37,507 WARN 
> org.apache.cassandra.streaming.StreamResultFuture: [Stream 
> #29f27cd0-b113-11e4-a465-91cc09fc46f1] Stream failed
> {noformat}
> The hadoop task will then just sit there and do nothing forever, which 
> indicates that it's getting progress updates from the cassandra code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to