Re: [Spark SQL] Task failed while writing rows

Timur Shenkao Sun, 25 Dec 2016 01:54:46 -0800

Hi,

I've just read your message.  Have you resolved the problem ?
If not, what is the contents of /etc/hosts ?


On Mon, Dec 19, 2016 at 10:09 PM, Michael Stratton <
michael.strat...@komodohealth.com> wrote:

> I don't think the issue is an empty partition, but it may not hurt to try
> a repartition prior to writing just to rule it out due to the premature EOF
> exception.
>
> On Mon, Dec 19, 2016 at 1:53 PM, Joseph Naegele <
> jnaeg...@grierforensics.com> wrote:
>
>> Thanks Michael, hdfs dfsadmin -report tells me:
>>
>>
>>
>> Configured Capacity: 7999424823296 (7.28 TB)
>>
>> Present Capacity: 7997657774971 (7.27 TB)
>>
>> DFS Remaining: 7959091768187 (7.24 TB)
>>
>> DFS Used: 38566006784 (35.92 GB)
>>
>> DFS Used%: 0.48%
>>
>> Under replicated blocks: 0
>>
>> Blocks with corrupt replicas: 0
>>
>> Missing blocks: 0
>>
>> Missing blocks (with replication factor 1): 0
>>
>>
>>
>> -------------------------------------------------
>>
>> Live datanodes (1):
>>
>>
>>
>> Name: 127.0.0.1:50010 (localhost)
>>
>> Hostname: XXX.XXX.XXX
>>
>> Decommission Status : Normal
>>
>> Configured Capacity: 7999424823296 (7.28 TB)
>>
>> DFS Used: 38566006784 (35.92 GB)
>>
>> Non DFS Used: 1767048325 (1.65 GB)
>>
>> DFS Remaining: 7959091768187 (7.24 TB)
>>
>> DFS Used%: 0.48%
>>
>> DFS Remaining%: 99.50%
>>
>> Configured Cache Capacity: 0 (0 B)
>>
>> Cache Used: 0 (0 B)
>>
>> Cache Remaining: 0 (0 B)
>>
>> Cache Used%: 100.00%
>>
>> Cache Remaining%: 0.00%
>>
>> Xceivers: 17
>>
>> Last contact: Mon Dec 19 13:00:06 EST 2016
>>
>>
>>
>> The Hadoop exception occurs because it times out after 60 seconds in a
>> “select” call on a java.nio.channels.SocketChannel, while waiting to
>> read from the socket. This implies the client writer isn’t writing on the
>> socket as expected, but shouldn’t this all be handled by the Hadoop library
>> within Spark?
>>
>>
>>
>> It looks like a few similar, but rare, cases have been reported before,
>> e.g. https://issues.apache.org/jira/browse/HDFS-770 which is *very* old.
>>
>>
>>
>> If you’re pretty sure Spark couldn’t be responsible for issues at this
>> level I’ll stick to the Hadoop mailing list.
>>
>>
>>
>> Thanks
>>
>> ---
>>
>> Joe Naegele
>>
>> Grier Forensics
>>
>>
>>
>> *From:* Michael Stratton [mailto:michael.strat...@komodohealth.com]
>> *Sent:* Monday, December 19, 2016 10:00 AM
>> *To:* Joseph Naegele <jnaeg...@grierforensics.com>
>> *Cc:* user <user@spark.apache.org>
>> *Subject:* Re: [Spark SQL] Task failed while writing rows
>>
>>
>>
>> It seems like an issue w/ Hadoop. What do you get when you run hdfs
>> dfsadmin -report?
>>
>>
>>
>> Anecdotally(And w/o specifics as it has been a while), I've generally
>> used Parquet instead of ORC as I've gotten a bunch of random problems
>> reading and writing ORC w/ Spark... but given ORC performs a lot better w/
>> Hive it can be a pain.
>>
>>
>>
>> On Sun, Dec 18, 2016 at 5:49 PM, Joseph Naegele <
>> jnaeg...@grierforensics.com> wrote:
>>
>> Hi all,
>>
>> I'm having trouble with a relatively simple Spark SQL job. I'm using
>> Spark 1.6.3. I have a dataset of around 500M rows (average 128 bytes per
>> record). It's current compressed size is around 13 GB, but my problem
>> started when it was much smaller, maybe 5 GB. This dataset is generated by
>> performing a query on an existing ORC dataset in HDFS, selecting a subset
>> of the existing data (i.e. removing duplicates). When I write this dataset
>> to HDFS using ORC I get the following exceptions in the driver:
>>
>> org.apache.spark.SparkException: Task failed while writing rows
>> Caused by: java.lang.RuntimeException: Failed to commit task
>> Suppressed: java.lang.IllegalArgumentException: Column has wrong number
>> of index entries found: 0 expected: 32
>>
>> Caused by: java.io.IOException: All datanodes 127.0.0.1:50010 are bad.
>> Aborting...
>>
>> This happens multiple times. The executors tell me the following a few
>> times before the same exceptions as above:
>>
>>
>>
>> 2016-12-09 02:38:12.193 INFO DefaultWriterContainer: Using output
>> committer class org.apache.hadoop.mapreduce.li
>> b.output.FileOutputCommitter
>>
>> 2016-12-09 02:41:04.679 WARN DFSClient: DFSOutputStream
>> ResponseProcessor exception  for block BP-1695049761-192.168.2.211-14
>> 79228275669:blk_1073862425_121642
>>
>> java.io.EOFException: Premature EOF: no length prefix available
>>
>>         at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(
>> PBHelper.java:2203)
>>
>>         at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.
>> readFields(PipelineAck.java:176)
>>
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$Response
>> Processor.run(DFSOutputStream.java:867)
>>
>>
>> My HDFS datanode says:
>>
>> 2016-12-09 02:39:24,783 INFO 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
>> src: /127.0.0.1:57836, dest: /127.0.0.1:50010, bytes: 14808395, op:
>> HDFS_WRITE, cliID: DFSClient_attempt_201612090102
>> _0000_m_000025_0_956624542_193, offset: 0, srvID:
>> 1003b822-200c-4b93-9f88-f474c0b6ce4a, blockid:
>> BP-1695049761-192.168.2.211-1479228275669:blk_1073862420_121637,
>> duration: 93026972
>>
>> 2016-12-09 02:39:24,783 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
>> PacketResponder: BP-1695049761-192.168.2.211-14
>> 79228275669:blk_1073862420_121637, type=LAST_IN_PIPELINE,
>> downstreams=0:[] terminating
>>
>> 2016-12-09 02:39:49,262 ERROR 
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> XXX.XXX.XXX.XXX:50010:DataXceiver error processing WRITE_BLOCK
>> operation  src: /127.0.0.1:57790 dst: /127.0.0.1:50010
>>
>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
>> channel
>> to be ready for read. ch : java.nio.channels.SocketChannel[connected
>> local=/127.0.0.1:50010 remote=/127.0.0.1:57790]
>>
>>
>> It looks like the datanode is receiving the block on multiple ports
>> (threads?) and one of the sending connections terminates early.
>>
>> I was originally running 6 executors with 6 cores and 24 GB RAM each
>> (Total: 36 cores, 144 GB) and experienced many of these issues, where
>> occasionally my job would fail altogether. Lowering the number of cores
>> appears to reduce the frequency of these errors, however I'm now down to 4
>> executors with 2 cores each (Total: 8 cores), which is significantly less,
>> and still see approximately 1-3 task failures.
>>
>> Details:
>> - Spark 1.6.3 - Standalone
>> - RDD compression enabled
>> - HDFS replication disabled
>> - Everything running on the same host
>> - Otherwise vanilla configs for Hadoop and Spark
>>
>> Does anybody have any ideas or hints? I can't imagine the problem is
>> solely related to the number of executor cores.
>>
>> Thanks,
>> Joe Naegele
>>
>>
>>
>
>

Re: [Spark SQL] Task failed while writing rows

Reply via email to