On Thu, 2009-09-24 at 16:32 +0200, Bertil Chapuis wrote: > Of course, I agree, this should be improved. I did that because I wanted > something which use a sax approach instead of xpath. I think it make > sense since xpath requires to build a DOM from the page. >
Nupp, that is not true though. > In my case, i finally used an other solution which is a bit different. > In the handler, I use an embedded pipeline (cocoon 3) to apply a > sequence of XSLT transformations on the parse. The goal is to obtain an > XML document in the form: > > <doc> > <field name="fieldname">value</field> > ... > </doc> jeje, lol you are using cocoon. nice. That is something I can very much relate to (I am using it since 2001). ;) > > eventually, a custom cocoon consumer sends the values to solr and commit > the result. > > I didn't sent the code because the approach was a bit strange. However > in my case i works well and i can use it to handle nearly everything. No please I would love to see this code since I actually do the same in my usecase connecting to solr. You may know http://wiki.apache.org/solr/SolrForrest which in the end works with cocoon 2.1 and 2.2. > > What do you think about such a solution? I have a little time next week > so I should be able to provide something more decent. This would be an awesome contribution. I would love to see it. salu2 > > Best regards, > > Bertil > > > > > On Thu, 2009-09-24 at 11:19 +0200, Thorsten Scherler wrote: > > On Wed, 2009-09-09 at 10:38 +0200, Bertil Chapuis wrote: > > > Hello, > > > > > > My name is Bertil Chapuis. I am using droids for a personal project and > > > I am trying to create a more customizable solr handler. > > > > > > I posted a ticket with my code (DROIDS-62). However, I am looking for a > > > way to filter the handler's execution. I'd like to handle the documents > > > only if their URI or content matches specific conditions. > > > > > > For example, the document is handled only if its uri matches the > > > following regex: > > > > > > http://www.awebsite.com/document-[0-9]*.htm > > > > > > What's the best way to do that? > > > > I had a chance to test this patch but in the end I could not use it for > > my use case. The problem that I have with it it that is limiting the > > access to the different elements in the tree to much. It is not generic > > since instead of using xpath expression (the standard approach to solve > > such a usecase) it uses "standard regexp". > > > > Further having a strong background on xml myself it stroke me ought to > > have element[0] which in xpath would have been element[1]. > > > > IMO if you can add xpath support to this component then it really rocks > > for many usecases since we would have a generic parser solution to > > extract informations the way it is now it will be for very few use > > cases. > > > > salu2 > > > > > Is it delegated to the handler's > > > implementation or is there a standard way? > > > > > > Best regards, > > > > > > Bertil Chapuis > > > > > > > -- Thorsten Scherler <thorsten.at.apache.org> Open Source Java <consulting, training and solutions> Sociedad Andaluza para el Desarrollo de la Sociedad de la Información, S.A.U. (SADESI)
