Re: RegexTransformer

2010-03-15 Thread Shalin Shekhar Mangar
On Mon, Mar 15, 2010 at 2:12 AM, blargy  wrote:

>
> How would I go about splitting a column by a certain delimiter AND ignore
> all
> empty matches.
>
> For example:
>
> 
>
> I have a some columns that dont have a value for values but so its getting
> actually index as blank. I just want to totally ignore those values. Is
> this
> possible?
>
>
You will probably have to write a custom Transformer to remove empty values.
See http://wiki.apache.org/solr/DIHCustomTransformer

-- 
Regards,
Shalin Shekhar Mangar.


Re: RegexTransformer

2010-03-15 Thread Michael Kuhlmann
On 03/15/10 08:56, Shalin Shekhar Mangar wrote:
> On Mon, Mar 15, 2010 at 2:12 AM, blargy  wrote:
> 
>>
>> How would I go about splitting a column by a certain delimiter AND ignore
>> all
>> empty matches.
[...]
> You will probably have to write a custom Transformer to remove empty values.
> See http://wiki.apache.org/solr/DIHCustomTransformer
> 
Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory
do the job?

See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.

Greetings,
Michael


Re: RegexTransformer

2010-03-15 Thread Shalin Shekhar Mangar
On Mon, Mar 15, 2010 at 2:53 PM, Michael Kuhlmann <
michael.kuhlm...@zalando.de> wrote:

> On 03/15/10 08:56, Shalin Shekhar Mangar wrote:
> > On Mon, Mar 15, 2010 at 2:12 AM, blargy  wrote:
> >
> >>
> >> How would I go about splitting a column by a certain delimiter AND
> ignore
> >> all
> >> empty matches.
> [...]
> > You will probably have to write a custom Transformer to remove empty
> values.
> > See http://wiki.apache.org/solr/DIHCustomTransformer
> >
> Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory
> do the job?
>
> See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
>
>
Yes but only on the indexed values. Empty values will still be stored and
returned in the response unless you stop them from reaching the indexing
chain.

-- 
Regards,
Shalin Shekhar Mangar.


Re: RegexTransformer

2010-03-15 Thread blargy

Thanks for the replies. Ill just roll out my own transformer for this.


Shalin Shekhar Mangar wrote:
> 
> On Mon, Mar 15, 2010 at 2:53 PM, Michael Kuhlmann <
> michael.kuhlm...@zalando.de> wrote:
> 
>> On 03/15/10 08:56, Shalin Shekhar Mangar wrote:
>> > On Mon, Mar 15, 2010 at 2:12 AM, blargy  wrote:
>> >
>> >>
>> >> How would I go about splitting a column by a certain delimiter AND
>> ignore
>> >> all
>> >> empty matches.
>> [...]
>> > You will probably have to write a custom Transformer to remove empty
>> values.
>> > See http://wiki.apache.org/solr/DIHCustomTransformer
>> >
>> Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory
>> do the job?
>>
>> See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
>>
>>
> Yes but only on the indexed values. Empty values will still be stored and
> returned in the response unless you stop them from reaching the indexing
> chain.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/RegexTransformer-tp27897870p27907090.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: RegexTransformer debugging (DIH)

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
If it is a normal exception it is logged with the number of document
where it failed and you can put it on debugger with start=&rows=1

We do not catch a throwable or Error so it gets slipped through.

if you are adventurous enough wrap the RegexTranformer with your own
and apply that say transformer="my.ReegexWrapper" and catch a
throwable and print out the row.




On Thu, Oct 16, 2008 at 9:49 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Is there a way to prevent this from occurring (or a way to nail down the doc
> which is causing it?):
>
> INFO: [news] webapp=/solr path=/admin/dataimport params={command=status}
> status=0 QTime=0
> Exception in thread "Thread-14" java.lang.StackOverflowError
>at java.util.regex.Pattern$Single.match(Pattern.java:3313)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
>at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
>at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>
> Thanks.
>
> - Jon
>
>



-- 
--Noble Paul


Re: RegexTransformer - need help with regex value

2011-09-14 Thread Pulkit Singhal
Thanks a bunch, got it working with a reluctant qualifier and the use
of " as the escaped representation of double qoutes within the
regex value so that the config file doesn't crash & burn:



Cheers,
- Pulkit

On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal  wrote:
> Hello,
>
> Feel free to point me to alternate sources of information if you deem
> this question unworthy of the Solr list :)
>
> But until then please hear me out!
>
> When my config is something like:
>                               regex=".*img src=.(.*)\.gif..alt=.*"
>                   sourceColName="description"
>                   />
> I don't get any data.
>
> But when my config is like:
>                               regex=".*img src=.(.*)..alt=.*"
>                   sourceColName="description"
>                   />
> I get the following data as the value for imageUrl:
> http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif";
> width="64"
>
> As the result shows, this is a string that should be able to match
> even on the 1st regex=".*img src=.(.*)\.gif..alt=.*" and produce a
> result like:
> http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_
> But it doesn't!
> Can anyone tell me why that would be the case?
> Is it something about the way RegexTransformer is wired or is it just
> my regex value that isn't right?
>