Re: Features not present in Solr

2010-03-22 Thread David Smiley @MITRE.org

I use Endeca and Solr.

A few notable things in Endeca but not in Solr:
1. Real-time search.
2. "related record navigation" (RRN) is what they call it.  This is the
ability to join in other records, something Lucene/Solr definitely can't do.
3. A reference application for browsing/searching the data.
4. Data pipeline management software including a GUI tool to wire in
different paths.  I'm not a fan of this because the implementation sucks.
5. Hierarchical facets, including sifts (e.g. A-E, F-M, etc.) and attaching
user meta-data to nodes (such as an id you need or something).
6. XQuery based ad-hoc querying with XML output.
7. Aggregating (e.g. rolling-up) records.

IMO, the really notable things to appreciate are #1, #2, and #3, though
admittedly I'm not using #1 or #2.  I would consider them if money is not a
problem and you really need #1 or #2.

Endeca's bloat and product age is a problem.  You have to run a number of
installers, you have over a dozen PDFs and other help documents... I'm
sometimes wondering where the heck I read something and what installer
installed what.  It's like comparing Oracle with perhaps PostgreSQL.  And
it's really annoying to have to deal with Endeca "dimension ids" (numbers)
instead of Solr facet string literals because I find myself having to map
them all the time.  The native Java API sucks.  I could complain a lot more
(I've stopped myself multiple times while writing this) but this post would
get out of control.  It _is_ a capable product, but I'll take Solr over it
any day -- at least I understand basically all of what's going on in Solr. 
Of course I wrote the book on it so I'm biased ;-)

~ David Smiley
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book


Srikanth B wrote:
> 
> Hello
> 
> We are in the process of researching on Solr features. I am looking for
> two
> things
> 1. Features not available in Solr but present in other products
> like
> Endeca
> 2. What one shouldn't not expect from Solr
> 
> Any thoughts ?
> 
> Thanks in advance
> Srikanth
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27996518.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr-ruby with clustering

2010-03-22 Thread mike anderson
false alarm, on the client side I was specifically setting a shard,
and this was causing my query/solr-ruby/solr to think it was a
distributed request, which isn't supported by the clustering
component.

cheers,
mike

On Mon, Mar 22, 2010 at 8:53 PM, mike anderson  wrote:
> Has anybody got solr-ruby to return a clustering result? (using the
> clustering component)
>
> I'm almost certain the query is correct (I check the solr logs for the
> query and run it in my browser, get back the cluster output as
> expected). But when I dump the response from my solr-ruby query the
> clustering output is nowhere to be found. I noticed that the
> clustering output has a data type of "Arr", where the response and
> other components have output of type "Lst", could this be the problem?
>
> If anyone can think of some other debugging I could try I'd love to hear it.
>
> Thanks in advance,
> Mike
>


solr-ruby with clustering

2010-03-22 Thread mike anderson
Has anybody got solr-ruby to return a clustering result? (using the
clustering component)

I'm almost certain the query is correct (I check the solr logs for the
query and run it in my browser, get back the cluster output as
expected). But when I dump the response from my solr-ruby query the
clustering output is nowhere to be found. I noticed that the
clustering output has a data type of "Arr", where the response and
other components have output of type "Lst", could this be the problem?

If anyone can think of some other debugging I could try I'd love to hear it.

Thanks in advance,
Mike


Re: synonyms problem

2010-03-22 Thread Lance Norskog
How large is the document, and how often does 'aberrant' appear in it?
Are the other words also in the document?

What is the full analysis stack? There might be interactions between
the SynonymFilter and other filters.

What does the admin/analysis.jsp page show? Does it throw OutOfMemory also?

Does stemming turn two of the terms into the same term?

On Mon, Mar 22, 2010 at 7:48 AM, Armando Ota  wrote:
> Have you tried increasing memory size ?
>
> we had some out of memory problems when we used default memory size ..
>
> Kind regards
>
> Armando
>
> michaelnazaruk wrote:
>>
>> Hi all! I have a little problem with synonyms:
>> when I set my synonyms.txt file such as:
>>
>> aberrant=>abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
>> it's all right! But if I set this file such as
>>
>> aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
>> I get exception that not enough memory
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Features not present in Solr

2010-03-22 Thread Lukáš Vlček
Hmm... sounds pretty much like what this book should be about (once
finished): http://www.manning.com/ingersoll/

On Mon, Mar 22, 2010 at 8:46 PM, Lance Norskog  wrote:

> About Text Analysis: "Natural Language Processing" is the more usual
> term. Finding parts of speech, isolating people's names, etc.
>
> On Mon, Mar 22, 2010 at 12:27 PM, Israel Ekpo 
> wrote:
> > On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog 
> wrote:
> >
> >> Web crawling.
> >
> >
> > I don't think Solr was designed with Web Crawling in mind. Nutch would be
> > more better suited for that, I believe.
> >
> >
> >> Text analysis.
> >>
> >
> > This is a bit vague.
> >
> > Please elaborate further. There is a lot of analysis (stemming, stop-word
> > removal, character transformation etc) that takes place already though
> > implicitly based on what fields you define and use in the schema.
> >
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >
> >
> >> Distributed index management.
> >> A fanatical devotion to the Pope.
> >>
> >> There a probably a lot of features already available in Solr out of the
> box
> > that most of those other "enterprise level" applications do not have yet.
> >
> > You would also be surprised to learn that a lot of them use Lucene under
> the
> > covers and are actually trying to re-implement what is already available
> in
> > Solr.
> >
> >
> >> On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
> >> >
> >> > Srikanth,
> >> >
> >> > I don't know anything about Endeca, so I can't compare Solr to it.
> >> > However, I know Solr is powerful. Very powerful.
> >> > So, maybe you should tell us more about your needs to get a good
> answer.
> >> >
> >> > As a response to your second question: You should not expect that Solr
> is
> >> > a database. It is an index-server. A database makes your data save. If
> >> there
> >> > goes something wrong - which is always possible - Solr gives no
> >> warranties.
> >> > Maybe someone other can tell you more about this topic.
> >> >
> >> > - Mitch
> >> >
> >> >
> >> > Srikanth B wrote:
> >> >>
> >> >> Hello
> >> >>
> >> >> We are in the process of researching on Solr features. I am looking
> for
> >> >> two
> >> >> things
> >> >> 1. Features not available in Solr but present in other
> products
> >> >> like
> >> >> Endeca
> >> >> 2. What one shouldn't not expect from Solr
> >> >>
> >> >> Any thoughts ?
> >> >>
> >> >> Thanks in advance
> >> >> Srikanth
> >> >>
> >> >>
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goks...@gmail.com
> >>
> >
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Features not present in Solr

2010-03-22 Thread Lance Norskog
About Text Analysis: "Natural Language Processing" is the more usual
term. Finding parts of speech, isolating people's names, etc.

On Mon, Mar 22, 2010 at 12:27 PM, Israel Ekpo  wrote:
> On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog  wrote:
>
>> Web crawling.
>
>
> I don't think Solr was designed with Web Crawling in mind. Nutch would be
> more better suited for that, I believe.
>
>
>> Text analysis.
>>
>
> This is a bit vague.
>
> Please elaborate further. There is a lot of analysis (stemming, stop-word
> removal, character transformation etc) that takes place already though
> implicitly based on what fields you define and use in the schema.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
>
>> Distributed index management.
>> A fanatical devotion to the Pope.
>>
>> There a probably a lot of features already available in Solr out of the box
> that most of those other "enterprise level" applications do not have yet.
>
> You would also be surprised to learn that a lot of them use Lucene under the
> covers and are actually trying to re-implement what is already available in
> Solr.
>
>
>> On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
>> >
>> > Srikanth,
>> >
>> > I don't know anything about Endeca, so I can't compare Solr to it.
>> > However, I know Solr is powerful. Very powerful.
>> > So, maybe you should tell us more about your needs to get a good answer.
>> >
>> > As a response to your second question: You should not expect that Solr is
>> > a database. It is an index-server. A database makes your data save. If
>> there
>> > goes something wrong - which is always possible - Solr gives no
>> warranties.
>> > Maybe someone other can tell you more about this topic.
>> >
>> > - Mitch
>> >
>> >
>> > Srikanth B wrote:
>> >>
>> >> Hello
>> >>
>> >> We are in the process of researching on Solr features. I am looking for
>> >> two
>> >> things
>> >>         1. Features not available in Solr but present in other products
>> >> like
>> >> Endeca
>> >>         2. What one shouldn't not expect from Solr
>> >>
>> >> Any thoughts ?
>> >>
>> >> Thanks in advance
>> >> Srikanth
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
Lance Norskog
goks...@gmail.com


Re: Query interface

2010-03-22 Thread Lance Norskog
There are several response formats available for Solr:

http://wiki.apache.org/solr/QueryResponseWriter

Also, XSLT scripts and Velocity scripts are available for
pre-processing output formats.

On Mon, Mar 22, 2010 at 9:00 AM, Armando Ota  wrote:
> Hey ...
>
> Thank you very much .. been strugling with this for hours now :(
>
> Will have to change the feature .. somehow :D
>
> Kind regards
>
> Armando
>
> Abdelhamid ABID wrote:
>>
>> Hi,
>> I think there isn't better than using XSLT as a mean to query solr and
>> render results.
>> Within an xslt file you would combine search form with search results in
>> one
>> place, by this way you free the server from the heavy duty tasks of xslt
>> transformation and let the client -which is in the most cases a browser-
>> do
>> the work.
>>
>> On 3/22/10, Gora Mohanty  wrote:
>>
>>>
>>> On Mon, 22 Mar 2010 15:26:41 +0100
>>> Sebastian Funk  wrote:
>>>
>>>

 hey there,

 i've been using solr for some time now and set everything up the
 way it's supposed to..
 now for the user interface: simply writing a javascript (or
 something else) website that passes the query-URL to solr and
 interprets the XML given as a result. is that the easiest way?
 i've noticed some problems with umlauts etc.. when using jetty or
 tomcat as a server..

 is there another way to query solr and retrieve the results?

>>>
>>> [...]
>>>
>>> Many modern frameworks (I certainly know of Ruby on Rails, and
>>> Django), have Solr integrated via an application. I really like
>>> Django Haystack for how it offers an easy way to get started with
>>> various search back-ends, with a very Django-ish feel to the
>>> interface: http://haystacksearch.org/
>>>
>>> Regards,
>>>
>>> Gora
>>>
>>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog
Whoops, yes it is in the wiki. A link from the admin page would be welcome.

On Mon, Mar 22, 2010 at 12:37 PM, Lance Norskog  wrote:
> There is a very cool debugger for the DataImportHandler:
>
> http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport
> debug jsp
>
> It is not mentioned on the wiki, nor are there any links to it in the
> Solr admin console.
>
> On Mon, Mar 22, 2010 at 8:36 AM, stocki  wrote:
>>
>> Helloo.
>>
>> i have the same database like in this example:
>> http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example
>>
>> this is my data-config.xml
>>
>> 
>>    >            query="select id, shop_id, is_active, order_index,
>> shop_item_number, manufacturer, name, ean, isbn, modified from shop_items">
>>       
>>       
>>       
>>       
>>       
>>       
>>       
>>       
>>           
>>           > dateTimeFormat="-MM-'hh:mm:ss'Z'" />
>>
>>                
>>                        > name="shop_category_id" />
>>
>>                        
>>                                
>>                        
>>                
>>
>>    
>>  
>>
>>
>> i have absolute no idea why solr didnt index the category name and
>> category_id...
>>
>> one product can have more than one values.
>>
>> please help meee someone .. ^^ ;)
>>
>> --
>> View this message in context: 
>> http://old.nabble.com/DIH---Categories-not-indexed--tp27988126p27988126.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog
There is a very cool debugger for the DataImportHandler:

http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport
debug jsp

It is not mentioned on the wiki, nor are there any links to it in the
Solr admin console.

On Mon, Mar 22, 2010 at 8:36 AM, stocki  wrote:
>
> Helloo.
>
> i have the same database like in this example:
> http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example
>
> this is my data-config.xml
>
> 
>                query="select id, shop_id, is_active, order_index,
> shop_item_number, manufacturer, name, ean, isbn, modified from shop_items">
>       
>       
>       
>       
>       
>       
>       
>       
>           
>            dateTimeFormat="-MM-'hh:mm:ss'Z'" />
>
>                
>                         name="shop_category_id" />
>
>                        
>                                
>                        
>                
>
>    
>  
>
>
> i have absolute no idea why solr didnt index the category name and
> category_id...
>
> one product can have more than one values.
>
> please help meee someone .. ^^ ;)
>
> --
> View this message in context: 
> http://old.nabble.com/DIH---Categories-not-indexed--tp27988126p27988126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Features not present in Solr

2010-03-22 Thread Israel Ekpo
On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog  wrote:

> Web crawling.


I don't think Solr was designed with Web Crawling in mind. Nutch would be
more better suited for that, I believe.


> Text analysis.
>

This is a bit vague.

Please elaborate further. There is a lot of analysis (stemming, stop-word
removal, character transformation etc) that takes place already though
implicitly based on what fields you define and use in the schema.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


> Distributed index management.
> A fanatical devotion to the Pope.
>
> There a probably a lot of features already available in Solr out of the box
that most of those other "enterprise level" applications do not have yet.

You would also be surprised to learn that a lot of them use Lucene under the
covers and are actually trying to re-implement what is already available in
Solr.


> On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
> >
> > Srikanth,
> >
> > I don't know anything about Endeca, so I can't compare Solr to it.
> > However, I know Solr is powerful. Very powerful.
> > So, maybe you should tell us more about your needs to get a good answer.
> >
> > As a response to your second question: You should not expect that Solr is
> > a database. It is an index-server. A database makes your data save. If
> there
> > goes something wrong - which is always possible - Solr gives no
> warranties.
> > Maybe someone other can tell you more about this topic.
> >
> > - Mitch
> >
> >
> > Srikanth B wrote:
> >>
> >> Hello
> >>
> >> We are in the process of researching on Solr features. I am looking for
> >> two
> >> things
> >> 1. Features not available in Solr but present in other products
> >> like
> >> Endeca
> >> 2. What one shouldn't not expect from Solr
> >>
> >> Any thoughts ?
> >>
> >> Thanks in advance
> >> Srikanth
> >>
> >>
> >
> > --
> > View this message in context:
> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Features not present in Solr

2010-03-22 Thread Lukáš Vlček
On Mon, Mar 22, 2010 at 8:16 PM, Lance Norskog  wrote:

> Web crawling.
>

Nutch, Lucene Conectors Framework... would it help to include this directly
into Solr code base?


> Text analysis.
>

Under development I think, see Mahout (check some proposed GSoC tickets in
JIRA)


> Distributed index management.
> A fanatical devotion to the Pope.
>
> On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
> >
> > Srikanth,
> >
> > I don't know anything about Endeca, so I can't compare Solr to it.
> > However, I know Solr is powerful. Very powerful.
> > So, maybe you should tell us more about your needs to get a good answer.
> >
> > As a response to your second question: You should not expect that Solr is
> > a database. It is an index-server. A database makes your data save. If
> there
> > goes something wrong - which is always possible - Solr gives no
> warranties.
> > Maybe someone other can tell you more about this topic.
> >
> > - Mitch
> >
> >
> > Srikanth B wrote:
> >>
> >> Hello
> >>
> >> We are in the process of researching on Solr features. I am looking for
> >> two
> >> things
> >> 1. Features not available in Solr but present in other products
> >> like
> >> Endeca
> >> 2. What one shouldn't not expect from Solr
> >>
> >> Any thoughts ?
> >>
> >> Thanks in advance
> >> Srikanth
> >>
> >>
> >
> > --
> > View this message in context:
> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Question about query

2010-03-22 Thread Erick Erickson
One thing I've seen suggested is to add the number of values to
a separate field, say topic_count. Then, in your situation above
you could append "AND topic_count=1". This can extend
to work if you wanted any number of matches (and only
that number). For instance,
topic=5 AND topic=10 AND topic=20 AND topic_count=3 would
give you article 4.

Don't know if this works in your particular situation

Erick

On Mon, Mar 22, 2010 at 10:32 AM, Armando Ota  wrote:

> Hi
>
> I need a little help with query for my problem (if it can be solved)
>
> I have a field in a document called topic
>
> this field contains some values, 0 (for no topic) or  1 (topic 1), 2, 3,
> etc ...
>
> It can contain many values like 1, 10, 50, etc (for 1 doc)
>
> So now to the problem:
> I would like to get documents that have 0 for topic value and documents
> that only have for example 1 for topic value inserted
>
> articles for example:
> article 1topics: 1, 5, 10, 20, 24
> article 2 topics: 0
> article 3 topics: 1
> article 4 topic: 5, 10, 20
> article 5 topic: 1, 13, 19
>
> So I need search query to return me only article 2 and 3 not other articles
> with 1 for topic value
>
> Can that be done ? Any help appreciated
>
> Kind regards
>
> Armando
>
>


Re: Features not present in Solr

2010-03-22 Thread Lance Norskog
Web crawling.
Text analysis.
Distributed index management.
A fanatical devotion to the Pope.

On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
>
> Srikanth,
>
> I don't know anything about Endeca, so I can't compare Solr to it.
> However, I know Solr is powerful. Very powerful.
> So, maybe you should tell us more about your needs to get a good answer.
>
> As a response to your second question: You should not expect that Solr is
> a database. It is an index-server. A database makes your data save. If there
> goes something wrong - which is always possible - Solr gives no warranties.
> Maybe someone other can tell you more about this topic.
>
> - Mitch
>
>
> Srikanth B wrote:
>>
>> Hello
>>
>> We are in the process of researching on Solr features. I am looking for
>> two
>> things
>>         1. Features not available in Solr but present in other products
>> like
>> Endeca
>>         2. What one shouldn't not expect from Solr
>>
>> Any thoughts ?
>>
>> Thanks in advance
>> Srikanth
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-22 Thread stocki

i patch an nightly build from solr.
patch runs, classes are in the correct folder, but when i replace spellcheck
with this spellchecl like in the comments, solr cannot find the classes =(



  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.jaspell.JaspellLookup
  text
  american-english

  


--> SCHWERWIEGEND: org.apache.solr.common.SolrException: Error loading class
'org.ap
ache.solr.spelling.suggest.Suggester'


why is it so ??  i think no one has so many trouble to run a patch like
me =( :D


Andrzej Bialecki wrote:
> 
> On 2010-03-19 13:03, stocki wrote:
>>
>> hello..
>>
>> i try to implement autosuggest component from these link:
>> http://issues.apache.org/jira/browse/SOLR-1316
>>
>> but i have no idea how to do this !?? can anyone get me some tipps ?
> 
> Please follow the instructions outlined in the JIRA issue, in the 
> comment that shows fragments of XML config files.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki <><
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/SOLR-1316-How-To-Implement-this-autosuggest-component-tp27950949p27990809.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Ahmet Arslan

> Thank you. I tried that but it did
> not work to remove trailing spaces.
> I believe this is why my size facet queries are not
> working. After
> reloading, the XML result entries still have:
> 
> 
> LARGE     
> MEDIUM    
> SMALL     
> 
> 
> I am using this:
> 
>     
>      class="solr.StandardTokenizerFactory"/>
>     
> 
> 
> And here is my size field:
>      indexed="true" stored="true"
> multiValued="true" required="false"/>

The problem is you are using string type (type="string") here. Which is not 
analyzed. It should be :






 


Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Willie Whitehead
Thank you. I tried that but it did not work to remove trailing spaces.
I believe this is why my size facet queries are not working. After
reloading, the XML result entries still have:


LARGE 
MEDIUM
SMALL 


I am using this:






And here is my size field:




I did not know what difference this does:


vs this:



But it appears I do not need that part.





On Mon, Mar 22, 2010 at 2:12 PM, Ahmet Arslan  wrote:
>
>> In my schema.xml, I am trying to remove whitespace from a
>> multivalued
>> field as they come from the database. Is this the correct
>> way:
>>
>>    > class="solr.TextField">
>>       
>>         > class="solr.StandardTokenizerFactory"/>
>>         > class="solr.TrimFilterFactory" />
>>       
>>     
>>
>> I do not believe this is working.
>
> TrimFilterFactory trims leading and trailing white-spaces. But 
> StandardTokenizerFactory already eats up white-spaces. In other words it is 
> meaningless to use it with StandardTokenizerFactory.
>
> In your field type definition you specified only query analyzer but not index 
> analyzer. You can use this directly:
>
> 
>     
>     
>     
> 
>
> What do you mean by removing whitespace from a multivalued field as they come 
> from the database?
>
>
>
>


Re: Correct way to use tokenizer for whitespace

2010-03-22 Thread Ahmet Arslan

> In my schema.xml, I am trying to remove whitespace from a
> multivalued
> field as they come from the database. Is this the correct
> way:
> 
>     class="solr.TextField">
>       
>          class="solr.StandardTokenizerFactory"/>
>          class="solr.TrimFilterFactory" />
>       
>     
> 
> I do not believe this is working.

TrimFilterFactory trims leading and trailing white-spaces. But 
StandardTokenizerFactory already eats up white-spaces. In other words it is 
meaningless to use it with StandardTokenizerFactory.

In your field type definition you specified only query analyzer but not index 
analyzer. You can use this directly:


    
            
    


What do you mean by removing whitespace from a multivalued field as they come 
from the database?





Correct way to use tokenizer for whitespace

2010-03-22 Thread Willie Whitehead
Hi,

In my schema.xml, I am trying to remove whitespace from a multivalued
field as they come from the database. Is this the correct way:

   
  


  


I do not believe this is working.

Thanks!


Re: Multi Select Facets through Java API

2010-03-22 Thread homerlex

With your eaxmple I got it working nicely with addFacetField and
addFilterQuery in the API.

Thanks, I appreciate the help.




Britske wrote:
> 
> something like this?
> 
> q=mainquery&fq={!tag=carfq}cars:corvette OR
> cars:camaro&facet=on&facet.field={!ex=carfq key=carfacet}cars
> 
> -the facet: "carfacet" is indepedennt of the filter query that filters on
> cars.
> -you construct the filter query (fq={!tag=carfq}cars:corvette OR
> cars:camaro) yourself in your application layer.
> 
> perhaps a disadvantage is that you get a lot of different filter queries
> which are all independently cached... I don't see any other way at the
> moment though..
> 
> Geert-Jan
> 
> 
> 
> 2010/3/22 homerlex 
> 
>>
>> bump - anyone?
>> --
>> View this message in context:
>> http://old.nabble.com/Multi-Select-Facets-through-Java-API-tp27951014p27986301.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Multi-Select-Facets-through-Java-API-tp27951014p27989508.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: use termscomponent like spellComponent ?!

2010-03-22 Thread stocki

thx.

it try to patch solr with 1316 but it not works =( 

do i need to checkout from svn Nightly ? 
http://svn.apache.org/repos/asf/lucene/solr/ 

when i create a patch and then create the WAR it has only 40 MB ...




Grant Ingersoll-6 wrote:
> 
> See https://issues.apache.org/jira/browse/SOLR-1316
> 
> 
> On Mar 21, 2010, at 2:34 PM, stocki wrote:
> 
>> 
>> hello.
>> 
>> i play with solr but i didn`t find the perfect solution for me.
>> 
>> my goal is a search like the amazonsearch from the iPhoneApp. ;)
>> 
>> it is possible to use the TermsComponent like the SpellComponent ? So,
>> that
>> works termsComp with more than one single Term ?!  
>> 
>> i got these 3 docs with the name in my index:
>> - nikon one
>> - nikon two
>> - nikon three
>> 
>> so when ich search for "nik" termsCom suggest me  "nikon". thats
>> correctly
>> whar i want.
>> but when i type "nikon on" i want that solr suggest me "nikon one" , 
>> 
>> how is that realizable ??? pleeease help me somebody ;) 
>> 
>> a merge of TC nad SC where best solution in think so.
>> 
>> > required="true" /> 
>> this is my searchfield. did i use the correct type ? 
>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/use-termscomponent-like-spellComponent--%21-tp27977008p27977008.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/use-termscomponent-like-spellComponent--%21-tp27977008p27988620.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query interface

2010-03-22 Thread Armando Ota

Hey ...

Thank you very much .. been strugling with this for hours now :(

Will have to change the feature .. somehow :D

Kind regards

Armando

Abdelhamid ABID wrote:

Hi,
I think there isn't better than using XSLT as a mean to query solr and
render results.
Within an xslt file you would combine search form with search results in one
place, by this way you free the server from the heavy duty tasks of xslt
transformation and let the client -which is in the most cases a browser- do
the work.

On 3/22/10, Gora Mohanty  wrote:
  

On Mon, 22 Mar 2010 15:26:41 +0100
Sebastian Funk  wrote:



hey there,

i've been using solr for some time now and set everything up the
way it's supposed to..
now for the user interface: simply writing a javascript (or
something else) website that passes the query-URL to solr and
interprets the XML given as a result. is that the easiest way?
i've noticed some problems with umlauts etc.. when using jetty or
tomcat as a server..

is there another way to query solr and retrieve the results?
  

[...]

Many modern frameworks (I certainly know of Ruby on Rails, and
Django), have Solr integrated via an application. I really like
Django Haystack for how it offers an easy way to get started with
various search back-ends, with a very Django-ish feel to the
interface: http://haystacksearch.org/

Regards,

Gora






  


Re: Query interface

2010-03-22 Thread Abdelhamid ABID
Hi,
I think there isn't better than using XSLT as a mean to query solr and
render results.
Within an xslt file you would combine search form with search results in one
place, by this way you free the server from the heavy duty tasks of xslt
transformation and let the client -which is in the most cases a browser- do
the work.

On 3/22/10, Gora Mohanty  wrote:
>
> On Mon, 22 Mar 2010 15:26:41 +0100
> Sebastian Funk  wrote:
>
> > hey there,
> >
> > i've been using solr for some time now and set everything up the
> > way it's supposed to..
> > now for the user interface: simply writing a javascript (or
> > something else) website that passes the query-URL to solr and
> > interprets the XML given as a result. is that the easiest way?
> > i've noticed some problems with umlauts etc.. when using jetty or
> > tomcat as a server..
> >
> > is there another way to query solr and retrieve the results?
>
> [...]
>
> Many modern frameworks (I certainly know of Ruby on Rails, and
> Django), have Solr integrated via an application. I really like
> Django Haystack for how it offers an easy way to get started with
> various search back-ends, with a very Django-ish feel to the
> interface: http://haystacksearch.org/
>
> Regards,
>
> Gora
>



-- 
Abdelhamid ABID
Software Engineer- J2EE / WEB / ESB MULE


Re: use termscomponent like spellComponent ?!

2010-03-22 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/SOLR-1316


On Mar 21, 2010, at 2:34 PM, stocki wrote:

> 
> hello.
> 
> i play with solr but i didn`t find the perfect solution for me.
> 
> my goal is a search like the amazonsearch from the iPhoneApp. ;)
> 
> it is possible to use the TermsComponent like the SpellComponent ? So, that
> works termsComp with more than one single Term ?!  
> 
> i got these 3 docs with the name in my index:
> - nikon one
> - nikon two
> - nikon three
> 
> so when ich search for "nik" termsCom suggest me  "nikon". thats correctly
> whar i want.
> but when i type "nikon on" i want that solr suggest me "nikon one" , 
> 
> how is that realizable ??? pleeease help me somebody ;) 
> 
> a merge of TC nad SC where best solution in think so.
> 
>  required="true" /> 
> this is my searchfield. did i use the correct type ? 
> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/use-termscomponent-like-spellComponent--%21-tp27977008p27977008.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Question about query

2010-03-22 Thread Armando Ota

Hey
Thank you for your reply .. but it's not working ... I still get other 
articles


Kind regards

Armando

Abdelhamid ABID wrote:

Well, here what I figure out !

(mm=1<50% , qf=topic , q="1" "0" ) ==> q=topic:0 or topic:1


On 3/22/10, Armando Ota  wrote:
  

Hi

I need a little help with query for my problem (if it can be solved)

I have a field in a document called topic

this field contains some values, 0 (for no topic) or  1 (topic 1), 2, 3,
etc ...

It can contain many values like 1, 10, 50, etc (for 1 doc)

So now to the problem:
I would like to get documents that have 0 for topic value and documents
that only have for example 1 for topic value inserted

articles for example:
article 1topics: 1, 5, 10, 20, 24
article 2 topics: 0
article 3 topics: 1
article 4 topic: 5, 10, 20
article 5 topic: 1, 13, 19

So I need search query to return me only article 2 and 3 not other articles
with 1 for topic value

Can that be done ? Any help appreciated

Kind regards

Armando






  


DIH - Categories not indexed ????

2010-03-22 Thread stocki

Helloo.

i have the same database like in this example:
http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example

this is my data-config.xml



   
   
   
   
   
   
   
  
  
   
   
   






   


  


i have absolute no idea why solr didnt index the category name and
category_id...

one product can have more than one values. 

please help meee someone .. ^^ ;)

-- 
View this message in context: 
http://old.nabble.com/DIH---Categories-not-indexed--tp27988126p27988126.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about query

2010-03-22 Thread Abdelhamid ABID
Well, here what I figure out !

(mm=1<50% , qf=topic , q="1" "0" ) ==> q=topic:0 or topic:1


On 3/22/10, Armando Ota  wrote:
>
> Hi
>
> I need a little help with query for my problem (if it can be solved)
>
> I have a field in a document called topic
>
> this field contains some values, 0 (for no topic) or  1 (topic 1), 2, 3,
> etc ...
>
> It can contain many values like 1, 10, 50, etc (for 1 doc)
>
> So now to the problem:
> I would like to get documents that have 0 for topic value and documents
> that only have for example 1 for topic value inserted
>
> articles for example:
> article 1topics: 1, 5, 10, 20, 24
> article 2 topics: 0
> article 3 topics: 1
> article 4 topic: 5, 10, 20
> article 5 topic: 1, 13, 19
>
> So I need search query to return me only article 2 and 3 not other articles
> with 1 for topic value
>
> Can that be done ? Any help appreciated
>
> Kind regards
>
> Armando
>
>


-- 
Elsadek
Software Engineer- J2EE / WEB / ESB MULE


Re: synonyms problem

2010-03-22 Thread Armando Ota

Have you tried increasing memory size ?

we had some out of memory problems when we used default memory size ..

Kind regards

Armando

michaelnazaruk wrote:

Hi all! I have a little problem with synonyms:
when I set my synonyms.txt file such as:
aberrant=>abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
it's all right! But if I set this file such as
aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
I get exception that not enough memory

  


synonyms problem

2010-03-22 Thread michaelnazaruk

Hi all! I have a little problem with synonyms:
when I set my synonyms.txt file such as:
aberrant=>abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
it's all right! But if I set this file such as
aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
I get exception that not enough memory

-- 
View this message in context: 
http://old.nabble.com/synonyms-problem-tp27987378p27987378.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Ross
I thought you might ask that :-)

It's because the pdf files are scanned from paper documents and OCR'd
to produce text. They still contain the image so are huge. The smaller
files are about 40 MB and cause a Java out of heap memory error. The
larger files are getting close to 500 MB. I didn't have anything to do
with the scanning. I'm guessing but it seems that something in the
Tomcat / Solr / Tika implementation tries to load it all into memory
at once.

pdftotext (part of http://www.foolabs.com/xpdf/download.html ) seems
to do it nicely and processes small chunks at a time.

Ross


On Mon, Mar 22, 2010 at 9:43 AM, Erik Hatcher  wrote:
> Why not feed the original PDF files in instead?  Just curious if pdftotext
> is doing a better job than Tika's PDFBox stuff.
>
>        Erik
>
> On Mar 22, 2010, at 9:30 AM, Ross wrote:
>
>> Thanks Georg
>>
>> I don't think it's that because it crashes on a one word test file I
>> create using the nano editor. I don't think nano is adding anything
>> extra.
>>
>> My real files are created by a Windows utility called pdftotext. I
>> solved the problem by getting pdftotext to generate html files rather
>> than plain text. It just adds an html header and wraps everything in a
>>  tag. That seems to keep Solr happy.
>>
>> Ross
>>
>> On Mon, Mar 22, 2010 at 9:08 AM, György Frivolt
>>  wrote:
>>>
>>> Hi,
>>>
>>>   I had problem with indexing documents some months ago as well. I found
>>> that there were XML control characters in the documents and these were
>>> not
>>> handled by Solr. Maybe it is the case for you as well.
>>>
>>> Regards,
>>>
>>>   Georg
>>>
>>>
>>> On Sun, Mar 21, 2010 at 5:58 PM, Ross  wrote:
>>>
 Hi all

 I'm trying to import some text files. I'm mostly following Avi
 Rappoport's tutorial.  Some of my files cause Solr to crash while
 indexing. I've narrowed it down to a very simple example.

 I have a file named test.txt with one line. That line is the word
 XXBLE and nothing else

 This is the command I'm using.

 curl "

 http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true
 "
 -F "myfi...@test.txt"

 The result is pasted below. Other files work just fine. The problem
 seems to be related to the letters B and E. If I change them to
 something else or make them lower case then it works. In my real
 files, the XX is something else but the result is the same. It's a
 common word in the files. I guess for this "quick and dirty" job I'm
 doing I could do a bulk replace in the files to make it lower case.

 Is there any workaround for this?

 Thanks
 Ross

 Apache Tomcat/6.0.20 - Error
 report HTTP Status 500 -
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.txt.txtpar...@19ccba

 org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.txt.txtpar...@19ccba
       at

 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
       at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
       at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
       at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
       at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
       at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
       at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
       at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
       at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
       at

 org.apache

Question about query

2010-03-22 Thread Armando Ota

Hi

I need a little help with query for my problem (if it can be solved)

I have a field in a document called topic

this field contains some values, 0 (for no topic) or  1 (topic 1), 2, 3, 
etc ...


It can contain many values like 1, 10, 50, etc (for 1 doc)

So now to the problem:
I would like to get documents that have 0 for topic value and documents 
that only have for example 1 for topic value inserted


articles for example:
article 1topics: 1, 5, 10, 20, 24
article 2 topics: 0
article 3 topics: 1
article 4 topic: 5, 10, 20
article 5 topic: 1, 13, 19

So I need search query to return me only article 2 and 3 not other 
articles with 1 for topic value


Can that be done ? Any help appreciated

Kind regards

Armando



Re: Query interface

2010-03-22 Thread Gora Mohanty
On Mon, 22 Mar 2010 15:26:41 +0100
Sebastian Funk  wrote:

> hey there,
> 
> i've been using solr for some time now and set everything up the
> way it's supposed to..
> now for the user interface: simply writing a javascript (or
> something else) website that passes the query-URL to solr and
> interprets the XML given as a result. is that the easiest way?
> i've noticed some problems with umlauts etc.. when using jetty or
> tomcat as a server..
> 
> is there another way to query solr and retrieve the results?
[...]

Many modern frameworks (I certainly know of Ruby on Rails, and
Django), have Solr integrated via an application. I really like
Django Haystack for how it offers an easy way to get started with
various search back-ends, with a very Django-ish feel to the
interface: http://haystacksearch.org/

Regards,
Gora


Query interface

2010-03-22 Thread Sebastian Funk

hey there,

i've been using solr for some time now and set everything up the way  
it's supposed to..
now for the user interface: simply writing a javascript (or something  
else) website that passes the query-URL to solr and interprets the XML  
given as a result. is that the easiest way? i've noticed some problems  
with umlauts etc.. when using jetty or tomcat as a server..


is there another way to query solr and retrieve the results?

thanks for any help,
sebastian funk


Re: Multi Select Facets through Java API

2010-03-22 Thread Geert-Jan Brits
something like this?

q=mainquery&fq={!tag=carfq}cars:corvette OR
cars:camaro&facet=on&facet.field={!ex=carfq key=carfacet}cars

-the facet: "carfacet" is indepedennt of the filter query that filters on cars.
-you construct the filter query (fq={!tag=carfq}cars:corvette OR
cars:camaro) yourself in your application layer.

perhaps a disadvantage is that you get a lot of different filter queries
which are all independently cached... I don't see any other way at the
moment though..

Geert-Jan



2010/3/22 homerlex 

>
> bump - anyone?
> --
> View this message in context:
> http://old.nabble.com/Multi-Select-Facets-through-Java-API-tp27951014p27986301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Erik Hatcher
Why not feed the original PDF files in instead?  Just curious if  
pdftotext is doing a better job than Tika's PDFBox stuff.


Erik

On Mar 22, 2010, at 9:30 AM, Ross wrote:


Thanks Georg

I don't think it's that because it crashes on a one word test file I
create using the nano editor. I don't think nano is adding anything
extra.

My real files are created by a Windows utility called pdftotext. I
solved the problem by getting pdftotext to generate html files rather
than plain text. It just adds an html header and wraps everything in a
 tag. That seems to keep Solr happy.

Ross

On Mon, Mar 22, 2010 at 9:08 AM, György Frivolt
 wrote:

Hi,

   I had problem with indexing documents some months ago as well. I  
found
that there were XML control characters in the documents and these  
were not

handled by Solr. Maybe it is the case for you as well.

Regards,

   Georg


On Sun, Mar 21, 2010 at 5:58 PM, Ross  wrote:


Hi all

I'm trying to import some text files. I'm mostly following Avi
Rappoport's tutorial.  Some of my files cause Solr to crash while
indexing. I've narrowed it down to a very simple example.

I have a file named test.txt with one line. That line is the word
XXBLE and nothing else

This is the command I'm using.

curl "
http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true
"
-F "myfi...@test.txt"

The result is pasted below. Other files work just fine. The problem
seems to be related to the letters B and E. If I change them to
something else or make them lower case then it works. In my real
files, the XX is something else but the result is the same. It's a
common word in the files. I guess for this "quick and dirty" job I'm
doing I could do a bulk replace in the files to make it lower case.

Is there any workaround for this?

Thanks
Ross

Apache Tomcat/6.0.20 - Error
report HTTP Status 500 -
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba
   at
org 
.apache 
.solr 
.handler 
.extraction 
.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)

   at
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
338)

   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
241)

   at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)

   at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)

   at
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

   at
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:191)

   at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

   at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

   at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)

   at
org 
.apache 
.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

   at
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
849)

   at
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)

   at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java: 
454)

   at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException fr

Re: Solr crashing while extracting from very simple text file

2010-03-22 Thread Ross
Thanks Georg

I don't think it's that because it crashes on a one word test file I
create using the nano editor. I don't think nano is adding anything
extra.

My real files are created by a Windows utility called pdftotext. I
solved the problem by getting pdftotext to generate html files rather
than plain text. It just adds an html header and wraps everything in a
 tag. That seems to keep Solr happy.

Ross

On Mon, Mar 22, 2010 at 9:08 AM, György Frivolt
 wrote:
> Hi,
>
>    I had problem with indexing documents some months ago as well. I found
> that there were XML control characters in the documents and these were not
> handled by Solr. Maybe it is the case for you as well.
>
> Regards,
>
>    Georg
>
>
> On Sun, Mar 21, 2010 at 5:58 PM, Ross  wrote:
>
>> Hi all
>>
>> I'm trying to import some text files. I'm mostly following Avi
>> Rappoport's tutorial.  Some of my files cause Solr to crash while
>> indexing. I've narrowed it down to a very simple example.
>>
>> I have a file named test.txt with one line. That line is the word
>> XXBLE and nothing else
>>
>> This is the command I'm using.
>>
>> curl "
>> http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true
>> "
>> -F "myfi...@test.txt"
>>
>> The result is pasted below. Other files work just fine. The problem
>> seems to be related to the letters B and E. If I change them to
>> something else or make them lower case then it works. In my real
>> files, the XX is something else but the result is the same. It's a
>> common word in the files. I guess for this "quick and dirty" job I'm
>> doing I could do a bulk replace in the files to make it lower case.
>>
>> Is there any workaround for this?
>>
>> Thanks
>> Ross
>>
>> Apache Tomcat/6.0.20 - Error
>> report HTTP Status 500 -
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.txtpar...@19ccba
>>
>> org.apache.solr.common.SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.txtpar...@19ccba
>>        at
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>>        at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>>        at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>>        at
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
>>        at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>>        at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>>        at
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>>        .

Re: Multi Select Facets through Java API

2010-03-22 Thread homerlex

bump - anyone?
-- 
View this message in context: 
http://old.nabble.com/Multi-Select-Facets-through-Java-API-tp27951014p27986301.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr crashing while extracting from very simple text file

2010-03-22 Thread György Frivolt
Hi,

I had problem with indexing documents some months ago as well. I found
that there were XML control characters in the documents and these were not
handled by Solr. Maybe it is the case for you as well.

Regards,

Georg


On Sun, Mar 21, 2010 at 5:58 PM, Ross  wrote:

> Hi all
>
> I'm trying to import some text files. I'm mostly following Avi
> Rappoport's tutorial.  Some of my files cause Solr to crash while
> indexing. I've narrowed it down to a very simple example.
>
> I have a file named test.txt with one line. That line is the word
> XXBLE and nothing else
>
> This is the command I'm using.
>
> curl "
> http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true
> "
> -F "myfi...@test.txt"
>
> The result is pasted below. Other files work just fine. The problem
> seems to be related to the letters B and E. If I change them to
> something else or make them lower case then it works. In my real
> files, the XX is something else but the result is the same. It's a
> common word in the files. I guess for this "quick and dirty" job I'm
> doing I could do a bulk replace in the files to make it lower case.
>
> Is there any workaround for this?
>
> Thanks
> Ross
>
> Apache Tomcat/6.0.20 - Error
> report HTTP Status 500 -
> org.apache.tika.exception.TikaException: Unexpected RuntimeException
> from org.apache.tika.parser.txt.txtpar...@19ccba
>
> org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException
> from org.apache.tika.parser.txt.txtpar...@19ccba
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
>at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>... 18 more
> Caused by: java.lang.NullPointerException
>at java.io.Reader.(Reader.java:78)
>at java.io.BufferedReader.(BufferedReader.java:93)
>at java.io.BufferedReader.(BufferedReader.java:108)
>at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
>... 20 more
> type Status
> reportmessage
> org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
>
> org.ap

Re: MLT question

2010-03-22 Thread Marc Sturlese

> My question is how can I paginate the results of this query? For example
> instead of setting rows you must specify mlt.count in the params. But how
> can I set the offset? mlt.offset?

As you do in a not mlt search request, setting start param should paginate
your response results

blargy wrote:
> 
> Im playing around with MLT and I am getting back decent results when
> searching against a particular document.
> 
> My question is how can I paginate the results of this query? For example
> instead of setting rows you must specify mlt.count in the params. But how
> can I set the offset? mlt.offset?
> 
> Thanks
> 

-- 
View this message in context: 
http://old.nabble.com/MLT-question-tp27973301p27985830.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: distributed solr and tf-idf

2010-03-22 Thread Koji Sekiguchi

Pooja Verlani wrote:

Hi,
How good is the distributed solr shards tf-idf (If at all its working with
solr 1.4) ?
Is there a chance of it getting better. I have to implement a huge index
with many shards. How is it possible to get a global tf-idf for the same,
any ideas?

Regards,
Pooja

  

Distributed idf is not supported 1.4. There is a patch:


https://issues.apache.org/jira/browse/SOLR-1632

Koji

--
http://www.rondhuit.com/en/



Index field untokenized

2010-03-22 Thread Alessandro Falasca (KCTP)

Hi All,
I want to index some data untokenized (e.g. url), but I can't
find a way to do it.

I know there is a way to do it in solr configuration but I want
to specify this options directly in my solr xml.

This is a fragment of the xml that i post in slr and I want to know if is possible to add to some field (e.g. 
modsCollection.name.xlink:href) an extra attribute in some other way the information about how to index it.//


///
http://www.fao.org/faooa/schemas/eims/v0.9"; 
xmlns:mods="http://www.loc.gov/mods/v3";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:eims="http://www.fao.org/faooa/schemas/eims/v0.9";
xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns:xalan="http://xml.apache.org/xalan";
xmlns:l="http://lang.data"; 
xmlns:fn="http://www.w3.org/2005/xpath-functions";
xmlns:dcterms="http://purl.org/dc/terms/"; 
xmlns:ags="http://www.fao.org/agris/agmes/schemas/0.1/";
xmlns:uvalibadmin="http://dl.lib.virginia.edu/bin/admin/admin.dtd/";

xmlns:uvalibdesc="http://dl.lib.virginia.edu/bin/dtd/descmeta/descmeta.dtd";
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"; 
xmlns:dc="http://purl.org/dc/elements/1.1/";
xmlns:foxml="info:fedora/fedora-system:def/foxml#" 
xmlns:zs="http://www.loc.gov/zing/srw/";>

eims-document:1960

.
http://aims.fao.org/aos/v01/corporatebody/c_1962

iso639-2b




/Regards,
Alessandro


http://www.fao.org/faooa/schemas/eims/v0.9"; xmlns:mods="http://www.loc.gov/mods/v3";
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:eims="http://www.fao.org/faooa/schemas/eims/v0.9";
	xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:xalan="http://xml.apache.org/xalan";
	xmlns:l="http://lang.data"; xmlns:fn="http://www.w3.org/2005/xpath-functions";
	xmlns:dcterms="http://purl.org/dc/terms/"; xmlns:ags="http://www.fao.org/agris/agmes/schemas/0.1/";
	xmlns:uvalibadmin="http://dl.lib.virginia.edu/bin/admin/admin.dtd/";
	xmlns:uvalibdesc="http://dl.lib.virginia.edu/bin/dtd/descmeta/descmeta.dtd";
	xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"; xmlns:dc="http://purl.org/dc/elements/1.1/";
	xmlns:foxml="info:fedora/fedora-system:def/foxml#" xmlns:zs="http://www.loc.gov/zing/srw/";>
	
		eims-document:1960
		
		Active
		Note relative à la réforme de l'ONU et de la FAO
		
		
		2010-03-11T13:37:44.537Z
		
		2010-03-11T13:39:15.819Z
		
		2
		AUDREC1
		Fedora API-M
		modifyDatastreamByValue
		
		DC
		fedoraAdmin
		
		2010-03-11T13:37:44.801Z
		
		Initial Import of this Object
		

		AUDREC2
		Fedora API-M
		addDatastream
		MODS
		fedoraAdmin
		
		2010-03-11T13:39:09.348Z
		


		AUDREC3
		Fedora API-M
		addDatastream
		AGRISFO
		fedoraAdmin
		
		2010-03-11T13:39:11.931Z
		


		AUDREC4
		Fedora API-M
		addDatastream
		EIMS
		fedoraAdmin
		
		2010-03-11T13:39:13.434Z
		


		AUDREC5
		Fedora API-M
		addDatastream
		SKOS
		fedoraAdmin
		
		2010-03-11T13:39:15.819Z
		



		fr
		Note relative à la réforme de l'ONU et de la FAO
		
		pubid.fao.org:210159
		FAO



		info:fedora/eims-document:1960
		
		faooa:FRBR-EXPRESSION
		J8010



		3.3

		2006-06-29


		fr
		Note relative à la réforme de l'ONU et de la FAO
		

		fao-aos-corporatebody
		corporate
		http://aims.fao.org/aos/v01/corporatebody/c_1962
		
		en
		FAO, Rome (Italy). Fisheries and Aquaculture
			Dept.

		marcrelator
		text
		Author
		marcrelator
		text


		conference
		en
		FAO Committee on Fisheries. Sub-Committee on
			Aquaculture (Sess. 4 : 6-10 Oct 2008 : Puerto Varas, Chile)

		marcrelator
		text
		Author
		marcrelator
		text


		type
		Conference
		type
		type
		Non-conventional
		type

		iso639-2b
		code
		fra
		iso639-2b
		code
		text
		French
		text

		jn
		J8010
		jn
		rn





		210159
		0
		3
		en
		KC



		1
		en
		Publication


	


distributed solr and tf-idf

2010-03-22 Thread Pooja Verlani
Hi,
How good is the distributed solr shards tf-idf (If at all its working with
solr 1.4) ?
Is there a chance of it getting better. I have to implement a huge index
with many shards. How is it possible to get a global tf-idf for the same,
any ideas?

Regards,
Pooja