Re: reading large XML files
Thanks, that sounds perfect On Tue, May 20, 2014 at 1:38 PM, Xiangrui Meng wrote: > You can search for XMLInputFormat on Google. There are some > implementations that allow you to specify the to split on, e.g.: > > https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java > > On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld > wrote: > > Unfortunately, I don't have a bunch of moderately big xml files; I have > one, > > really big file - big enough that reading it into memory as a single > string > > is not feasible. > > > > > > On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng wrote: > >> > >> Try sc.wholeTextFiles(). It reads the entire file into a string > >> record. -Xiangrui > >> > >> On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld > >> wrote: > >> > We are trying to read some large GraphML files to use in spark. > >> > > >> > Is there an easy way to read XML-based files like this that accounts > for > >> > partition boundaries and the like? > >> > > >> > Thanks, > >> > Nathan > >> > > >> > > >> > -- > >> > Nathan Kronenfeld > >> > Senior Visualization Developer > >> > Oculus Info Inc > >> > 2 Berkeley Street, Suite 600, > >> > Toronto, Ontario M5A 4J5 > >> > Phone: +1-416-203-3003 x 238 > >> > Email: nkronenf...@oculusinfo.com > > > > > > > > > > -- > > Nathan Kronenfeld > > Senior Visualization Developer > > Oculus Info Inc > > 2 Berkeley Street, Suite 600, > > Toronto, Ontario M5A 4J5 > > Phone: +1-416-203-3003 x 238 > > Email: nkronenf...@oculusinfo.com > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: reading large XML files
You can search for XMLInputFormat on Google. There are some implementations that allow you to specify the to split on, e.g.: https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld wrote: > Unfortunately, I don't have a bunch of moderately big xml files; I have one, > really big file - big enough that reading it into memory as a single string > is not feasible. > > > On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng wrote: >> >> Try sc.wholeTextFiles(). It reads the entire file into a string >> record. -Xiangrui >> >> On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld >> wrote: >> > We are trying to read some large GraphML files to use in spark. >> > >> > Is there an easy way to read XML-based files like this that accounts for >> > partition boundaries and the like? >> > >> > Thanks, >> > Nathan >> > >> > >> > -- >> > Nathan Kronenfeld >> > Senior Visualization Developer >> > Oculus Info Inc >> > 2 Berkeley Street, Suite 600, >> > Toronto, Ontario M5A 4J5 >> > Phone: +1-416-203-3003 x 238 >> > Email: nkronenf...@oculusinfo.com > > > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com
Re: reading large XML files
Unfortunately, I don't have a bunch of moderately big xml files; I have one, really big file - big enough that reading it into memory as a single string is not feasible. On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng wrote: > Try sc.wholeTextFiles(). It reads the entire file into a string > record. -Xiangrui > > On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld > wrote: > > We are trying to read some large GraphML files to use in spark. > > > > Is there an easy way to read XML-based files like this that accounts for > > partition boundaries and the like? > > > > Thanks, > > Nathan > > > > > > -- > > Nathan Kronenfeld > > Senior Visualization Developer > > Oculus Info Inc > > 2 Berkeley Street, Suite 600, > > Toronto, Ontario M5A 4J5 > > Phone: +1-416-203-3003 x 238 > > Email: nkronenf...@oculusinfo.com > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com
Re: reading large XML files
Try sc.wholeTextFiles(). It reads the entire file into a string record. -Xiangrui On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld wrote: > We are trying to read some large GraphML files to use in spark. > > Is there an easy way to read XML-based files like this that accounts for > partition boundaries and the like? > > Thanks, > Nathan > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com