RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-23 Thread Venkat, Ankam
Spark Committers: Please advise the way forward for this issue. Thanks for your support. Regards, Venkat From: Venkat, Ankam Sent: Thursday, January 22, 2015 9:34 AM To: 'Frank Austin Nothaft'; 'user@spark.apache.org' Cc: 'Nick Allen' Subject: RE: How to 'Pipe' Binary Data in Apache Spark How

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Frank Austin Nothaft
: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The Spark RDD.pipe method pipes text data into a subprocess

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
How much time it takes to port it? Spark committers: Please let us know your thoughts. Regards, Venkat From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Thursday, January 22, 2015 9:08 AM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
: What's your take on this? Regards, Venkat Ankam From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The Spark RDD.pipe

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Silvio Fiorito
Nick, Have you tried https://github.com/kaitoy/pcap4j I’ve used this in a Spark app already and didn’t have any issues. My use case was slightly different than yours, but you should give it a try. From: Nick Allen n...@nickallen.orgmailto:n...@nickallen.org Date: Friday, January 16, 2015 at

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Frank Austin Nothaft
: 'function' object has no attribute 'read' Any suggestions? Regards, Venkat Ankam From: Nick Allen [mailto:n...@nickallen.org] Sent: Friday, January 16, 2015 11:46 AM To: user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark I just wanted to reiterate

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Venkat, Ankam
different options. AttributeError: 'function' object has no attribute 'read' Any suggestions? Regards, Venkat Ankam From: Nick Allen [mailto:n...@nickallen.org] Sent: Friday, January 16, 2015 11:46 AM To: user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark I just wanted

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Sean Owen
Well it looks like you're reading some kind of binary file as text. That isn't going to work, in Spark or elsewhere, as binary data is not even necessarily the valid encoding of a string. There are no line breaks to delimit lines and thus elements of the RDD. Your input has some record structure

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
Per your last comment, it appears I need something like this: https://github.com/RIPE-NCC/hadoop-pcap Thanks a ton. That get me oriented in the right direction. On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen so...@cloudera.com wrote: Well it looks like you're reading some kind of binary file

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
I just wanted to reiterate the solution for the benefit of the community. The problem is not from my use of 'pipe', but that 'textFile' cannot be used to read in binary data. (Doh) There are a couple options to move forward. 1. Implement a custom 'InputFormat' that understands the binary input