Thanks, that sounds perfect
On Tue, May 20, 2014 at 1:38 PM, Xiangrui Meng <men...@gmail.com> wrote: > You can search for XMLInputFormat on Google. There are some > implementations that allow you to specify the <tag> to split on, e.g.: > > https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java > > On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld > <nkronenf...@oculusinfo.com> wrote: > > Unfortunately, I don't have a bunch of moderately big xml files; I have > one, > > really big file - big enough that reading it into memory as a single > string > > is not feasible. > > > > > > On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> > >> Try sc.wholeTextFiles(). It reads the entire file into a string > >> record. -Xiangrui > >> > >> On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld > >> <nkronenf...@oculusinfo.com> wrote: > >> > We are trying to read some large GraphML files to use in spark. > >> > > >> > Is there an easy way to read XML-based files like this that accounts > for > >> > partition boundaries and the like? > >> > > >> > Thanks, > >> > Nathan > >> > > >> > > >> > -- > >> > Nathan Kronenfeld > >> > Senior Visualization Developer > >> > Oculus Info Inc > >> > 2 Berkeley Street, Suite 600, > >> > Toronto, Ontario M5A 4J5 > >> > Phone: +1-416-203-3003 x 238 > >> > Email: nkronenf...@oculusinfo.com > > > > > > > > > > -- > > Nathan Kronenfeld > > Senior Visualization Developer > > Oculus Info Inc > > 2 Berkeley Street, Suite 600, > > Toronto, Ontario M5A 4J5 > > Phone: +1-416-203-3003 x 238 > > Email: nkronenf...@oculusinfo.com > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com