Nick,

Have you tried https://github.com/kaitoy/pcap4j

I’ve used this in a Spark app already and didn’t have any issues. My use case 
was slightly different than yours, but you should give it a try.

From: Nick Allen <n...@nickallen.org<mailto:n...@nickallen.org>>
Date: Friday, January 16, 2015 at 10:09 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: How to 'Pipe' Binary Data in Apache Spark


I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe 
that binary data to an external program that will translate it to string/text 
data. Unfortunately, it seems that Spark is mangling the binary data before it 
gets passed to the external program.

This code is representative of what I am trying to do. What am I doing wrong? 
How can I pipe binary data in Spark?  Maybe it is getting corrupted when I read 
it in initially with 'textFile'?

bin = sc.textFile("binary-data.dat")
csv = bin.pipe ("/usr/bin/binary-to-csv.sh")
csv.saveAsTextFile("text-data.csv")

Specifically, I am trying to use Spark to transform pcap (packet capture) data 
to text/csv so that I can perform an analysis on it.

Thanks!

--
Nick Allen <n...@nickallen.org<mailto:n...@nickallen.org>>

Reply via email to