Spark Committers: Please advise the way forward for this issue.
Thanks for your support.
Regards,
Venkat
From: Venkat, Ankam
Sent: Thursday, January 22, 2015 9:34 AM
To: 'Frank Austin Nothaft'; 'user@spark.apache.org'
Cc: 'Nick Allen'
Subject: RE: How to 'Pipe' Binary Data in Apache Spark
How
: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
Hi Venkat/Nick,
The Spark RDD.pipe method pipes text data into a subprocess
How much time it takes to port it?
Spark committers: Please let us know your thoughts.
Regards,
Venkat
From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Thursday, January 22, 2015 9:08 AM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary
: What's your take on this?
Regards,
Venkat Ankam
From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
Hi Venkat/Nick,
The Spark RDD.pipe
Nick,
Have you tried https://github.com/kaitoy/pcap4j
I’ve used this in a Spark app already and didn’t have any issues. My use case
was slightly different than yours, but you should give it a try.
From: Nick Allen n...@nickallen.orgmailto:n...@nickallen.org
Date: Friday, January 16, 2015 at
: 'function' object has no attribute 'read'
Any suggestions?
Regards,
Venkat Ankam
From: Nick Allen [mailto:n...@nickallen.org]
Sent: Friday, January 16, 2015 11:46 AM
To: user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
I just wanted to reiterate
different
options.
AttributeError: 'function' object has no attribute 'read'
Any suggestions?
Regards,
Venkat Ankam
From: Nick Allen [mailto:n...@nickallen.org]
Sent: Friday, January 16, 2015 11:46 AM
To: user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
I just wanted
Well it looks like you're reading some kind of binary file as text.
That isn't going to work, in Spark or elsewhere, as binary data is not
even necessarily the valid encoding of a string. There are no line
breaks to delimit lines and thus elements of the RDD.
Your input has some record structure
Per your last comment, it appears I need something like this:
https://github.com/RIPE-NCC/hadoop-pcap
Thanks a ton. That get me oriented in the right direction.
On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen so...@cloudera.com wrote:
Well it looks like you're reading some kind of binary file
I just wanted to reiterate the solution for the benefit of the community.
The problem is not from my use of 'pipe', but that 'textFile' cannot be
used to read in binary data. (Doh) There are a couple options to move
forward.
1. Implement a custom 'InputFormat' that understands the binary input
10 matches
Mail list logo