Re: Solr: extracting/indexing HTML via cURL

2012-05-02 Thread Lance Norskog
processing chain, but >> that may be too much effort compared to the HTML strip filter. >> >> -- Jack Krupansky >> >> -Original Message- From: okayndc >> Sent: Monday, April 30, 2012 10:07 AM >> To: solr-user@lucene.apache.org >> Subject: Solr: e

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
iginal Message- From: okayndc > Sent: Monday, April 30, 2012 10:07 AM > To: solr-user@lucene.apache.org > Subject: Solr: extracting/indexing HTML via cURL > > > Hello, > > Over the weekend I experimented with extracting HTML content via cURL and > just > wondering why the e

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky
nday, April 30, 2012 10:07 AM To: solr-user@lucene.apache.org Subject: Solr: extracting/indexing HTML via cURL Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though

Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to in