Re: RegexTransformer
On Mon, Mar 15, 2010 at 2:12 AM, blargy wrote: > > How would I go about splitting a column by a certain delimiter AND ignore > all > empty matches. > > For example: > > > > I have a some columns that dont have a value for values but so its getting > actually index as blank. I just want to totally ignore those values. Is > this > possible? > > You will probably have to write a custom Transformer to remove empty values. See http://wiki.apache.org/solr/DIHCustomTransformer -- Regards, Shalin Shekhar Mangar.
Re: RegexTransformer
On 03/15/10 08:56, Shalin Shekhar Mangar wrote: > On Mon, Mar 15, 2010 at 2:12 AM, blargy wrote: > >> >> How would I go about splitting a column by a certain delimiter AND ignore >> all >> empty matches. [...] > You will probably have to write a custom Transformer to remove empty values. > See http://wiki.apache.org/solr/DIHCustomTransformer > Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory do the job? See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. Greetings, Michael
Re: RegexTransformer
On Mon, Mar 15, 2010 at 2:53 PM, Michael Kuhlmann < michael.kuhlm...@zalando.de> wrote: > On 03/15/10 08:56, Shalin Shekhar Mangar wrote: > > On Mon, Mar 15, 2010 at 2:12 AM, blargy wrote: > > > >> > >> How would I go about splitting a column by a certain delimiter AND > ignore > >> all > >> empty matches. > [...] > > You will probably have to write a custom Transformer to remove empty > values. > > See http://wiki.apache.org/solr/DIHCustomTransformer > > > Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory > do the job? > > See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. > > Yes but only on the indexed values. Empty values will still be stored and returned in the response unless you stop them from reaching the indexing chain. -- Regards, Shalin Shekhar Mangar.
Re: RegexTransformer
Thanks for the replies. Ill just roll out my own transformer for this. Shalin Shekhar Mangar wrote: > > On Mon, Mar 15, 2010 at 2:53 PM, Michael Kuhlmann < > michael.kuhlm...@zalando.de> wrote: > >> On 03/15/10 08:56, Shalin Shekhar Mangar wrote: >> > On Mon, Mar 15, 2010 at 2:12 AM, blargy wrote: >> > >> >> >> >> How would I go about splitting a column by a certain delimiter AND >> ignore >> >> all >> >> empty matches. >> [...] >> > You will probably have to write a custom Transformer to remove empty >> values. >> > See http://wiki.apache.org/solr/DIHCustomTransformer >> > >> Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory >> do the job? >> >> See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. >> >> > Yes but only on the indexed values. Empty values will still be stored and > returned in the response unless you stop them from reaching the indexing > chain. > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://old.nabble.com/RegexTransformer-tp27897870p27907090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: RegexTransformer debugging (DIH)
If it is a normal exception it is logged with the number of document where it failed and you can put it on debugger with start=&rows=1 We do not catch a throwable or Error so it gets slipped through. if you are adventurous enough wrap the RegexTranformer with your own and apply that say transformer="my.ReegexWrapper" and catch a throwable and print out the row. On Thu, Oct 16, 2008 at 9:49 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > Is there a way to prevent this from occurring (or a way to nail down the doc > which is causing it?): > > INFO: [news] webapp=/solr path=/admin/dataimport params={command=status} > status=0 QTime=0 > Exception in thread "Thread-14" java.lang.StackOverflowError >at java.util.regex.Pattern$Single.match(Pattern.java:3313) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) >at java.util.regex.Pattern$Branch.match(Pattern.java:4538) >at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) >at java.util.regex.Pattern$Branch.match(Pattern.java:4538) >at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) >at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) >at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) >at java.util.regex.Pattern$All.match(Pattern.java:4079) > > Thanks. > > - Jon > > -- --Noble Paul
Re: RegexTransformer - need help with regex value
Thanks a bunch, got it working with a reluctant qualifier and the use of " as the escaped representation of double qoutes within the regex value so that the config file doesn't crash & burn: Cheers, - Pulkit On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal wrote: > Hello, > > Feel free to point me to alternate sources of information if you deem > this question unworthy of the Solr list :) > > But until then please hear me out! > > When my config is something like: > regex=".*img src=.(.*)\.gif..alt=.*" > sourceColName="description" > /> > I don't get any data. > > But when my config is like: > regex=".*img src=.(.*)..alt=.*" > sourceColName="description" > /> > I get the following data as the value for imageUrl: > http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif"; > width="64" > > As the result shows, this is a string that should be able to match > even on the 1st regex=".*img src=.(.*)\.gif..alt=.*" and produce a > result like: > http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_ > But it doesn't! > Can anyone tell me why that would be the case? > Is it something about the way RegexTransformer is wired or is it just > my regex value that isn't right? >