Hi Matt, The nutch segments are stored as Hadoop SequenceFiles and MapFiles. MapFile is made up of multiple SequenceFiles. I'm not certain if the format is documented anywhere, but the source is in org.apache.hadoop.io. I doubt you'll find a PHP library for reading them, so you'll probably have to write something yourself.
-Todd On Mon, Jan 5, 2009 at 10:32 AM, Matt Pearson <mpear...@lizearle.com> wrote: > Hi Everyone, > > > > I'm looking into reading data from Nutch segments with PHP is there > anywhere where I can get information on the format in which the data is > stored? > > > > Thanks and apologies if this isn't the right place to ask this question. > > > > > > Matt Pearson > > > > >