Re: Features not present in Solr

2010-03-21 Thread MitchK

Srikanth,

I don't know anything about Endeca, so I can't compare Solr to it.
However, I know Solr is powerful. Very powerful. 
So, maybe you should tell us more about your needs to get a good answer.

As a response to your second question: You should not expect that Solr is
a database. It is an index-server. A database makes your data save. If there
goes something wrong - which is always possible - Solr gives no warranties.
Maybe someone other can tell you more about this topic.

- Mitch


Srikanth B wrote:
> 
> Hello
> 
> We are in the process of researching on Solr features. I am looking for
> two
> things
> 1. Features not available in Solr but present in other products
> like
> Endeca
> 2. What one shouldn't not expect from Solr
> 
> Any thoughts ?
> 
> Thanks in advance
> Srikanth
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facetting with Synonyms

2010-03-21 Thread MitchK

Hi Otis,

thank you for responsing. 
Hmm, since I am not omniscient, this seems to be no way for me, because this
would mean I have to know all about the artist at index-time.
But your response makes me thinking about an idea: A synonym-mapper. 
The syonym-mapper should work on the responsed facets of the query.

It is not important to map S&P to Snaga & Pillath and force Solr to combine
both result sets. 
The same to HP and Hewlett Packard. To response only one of those terms to
the user is enough, since I can translate "HP" with the help of a
synonymFilter to "Hewlett Packard" at query-time, if the user is interested
in such a facet. 

What do you think about this?
If I want to do such changes to Solr, I think I need to customize something
that directly computes the results for the responseWriter. Do you know which
classes are responsible for that?
If this would be too complicated, because one has to make changes in too
much classes, maybe I will contribute a tool which does this on an already
built response. 
Another way would be to create only a new responseWriter, am I right?

If you think this would be a good idea, I will go on to ask some
architectural questions, to save memory and time. Maybe I will go on to open
an issue for that...

Any other ideas are welcome :-)!

Kind regards
- Mitch


Otis Gospodnetic wrote:
> 
> Hi Mitch,
> 
> You asked how others would solve this problem.  I would try to normalize
> the data before indexing it.  In other words, I'd clean it up myself to
> avoid GIGO situation.
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
-- 
View this message in context: 
http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27982710.html
Sent from the Solr - User mailing list archive at Nabble.com.



Chunked streaming upload to Solr

2010-03-21 Thread Lance Norskog
I would like to upload data to Solr for indexing, in chunks, in one
HTTP POST request. Is this possible? What exactly should I set as the
client socket parameters?

What I'm getting is that with the default parameters, the first write
adds a Content-Length matching the size of the first chunk. Solr reads
that as the entire upload. Apparently the right way to handle this
with an HTTP request parameter "Transfer-Encoding" set to "chunked".
(I don't know the total size of the upload.) This results in the HTTP
parser blowing up. Here is the stack trace:

Mar 21, 2010 8:35:18 PM sun.reflect.NativeMethodAccessorImpl invoke0
WARNING: handle failed
java.io.IOException: bad chunk char: 115
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:687)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server


Has anyone made this work?

Thanks,

-- 
Lance Norskog
goks...@gmail.com


Re: Facetting with Synonyms

2010-03-21 Thread Otis Gospodnetic
Hi Mitch,

You asked how others would solve this problem.  I would try to normalize the 
data before indexing it.  In other words, I'd clean it up myself to avoid GIGO 
situation.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: MitchK 
> To: solr-user@lucene.apache.org
> Sent: Sun, March 21, 2010 12:28:16 PM
> Subject: Facetting with Synonyms
> 
> 
Hello out there,

I got a little problem: 
Users take care about 
> what will be indexed and what not. Sometimes there is
a little 
> problem:
For example: The artists "Snaga & Pillath" are similar to "S & P". 
> When I
Index the document, I can solve this problem with the help of 
> a
SynonymFilter. However, if I want to retrive some facets over 
> a
result-response, there is a little problem: "S&P" and "Snaga & Pillath" 
> both
will be responsed.
Is there a possibility to response only "S&P" OR 
> "Snaga & Pillath"?

I think another example for something like this is 
> "HP" and "Hewlett
Packard". If one user calls the manufacturer of his printer 
> "HP" and another
one says "Hewlett Packard" and you want to do some 
> facetting, there will be
two responsed terms. 

But the truth is: Every 
> HP and every Hewlett Packard facet, as well as every
Snaga & Pillath/S&P 
> facet should facet the same documents.

How would you solve this 
> problem?

Kind regards
- Mitch
-- 
View this message in context: 
> 
> target=_blank 
> >http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.


Re: Features not present in Solr

2010-03-21 Thread Gora Mohanty
On Mon, 22 Mar 2010 05:12:06 +0530
Srikanth B  wrote:

> Thanks but Im looking for answers on the functional and technical
> front.
[...]

Yours is a very broad question, and the details of the answers
probably depend on the domain that you are trying to use Solr in.

Solr is extensively documented on the Wiki, and the Solr 1.4 book
is available. Why not look for yourself, and see if Solr meets
your needs?

Regards,
Gora


Re: Weired behaviour for certain search terms

2010-03-21 Thread Akash Sahu

The search is now working for those terms. I did the following changes.
In schema file, I replace  
 

with 

.



Ahmet Arslan wrote:
> 
> 
>> I tired adding &hl.maxAnalyzedChars=-1 to my search
>> query but it didnt
>> helped.
>> Just wanted to know if there are limitations on the certain
>> search terms.
>> Its bit strange that solr is not behaving properly for
>> certain terms
>> (especially returning the excerpts in highlighting
>> dictionary).
>> The terms which i have found so far are:
>> 1. co-ownership
>> 2. "co ownership"
>> 3. co-employees
>> 
> 
> Can you paste your field type definition and declaration? Are you storing
> term vectors? Also can you give us an query and document pair (returns
> that document but no highlighting) I will try to reproduce the problem. 
> 
> Also what happens when you use &hl.usePhraseHighlighter=false?
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Weired-behaviour-for-certain-search-terms-tp27927995p27981626.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Features not present in Solr

2010-03-21 Thread Srikanth B
Thanks but Im looking for answers on the functional and technical front.

On 3/20/10, Israel Ekpo  wrote:
>
> One feature that is not available in Solr is any licensing fees and fine
> print.
>
> Also you should not expect to pay in order to use Solr.
>
> On Fri, Mar 19, 2010 at 11:16 PM, Srikanth B 
> wrote:
>
> > Hello
> >
> > We are in the process of researching on Solr features. I am looking for
> two
> > things
> >1. Features not available in Solr but present in other products
> like
> > Endeca
> >2. What one shouldn't not expect from Solr
> >
> > Any thoughts ?
> >
> > Thanks in advance
> > Srikanth
> >
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


dismax and q.op

2010-03-21 Thread Mark Fletcher
Hi,

I am using dismax handler. I have it set up in my solrconfig.xml.
 I have *not* used default="true" while setting it up  (the standard still
has default="true")
 *I haven't mentioned value for mm*
In my schema.xml I have set the default operator to be AND.
When I query I use the following in my query url where my query is for say
for example  *international monetary fund*:-
.../select?*q.alt*=international+monetary+fund&*qt=dismax*

My result:- No results; but each of the terms individually gave me results!

I appreciate any help on my following queries :-
1. Will the query look for documents that have *international* AND *monetary
* AND *fund*
or is it some other behavior based on the setting I have mentioned
above.
2. Does the default operator specified in schema.xml take effect when we use
dismax also or is it only for the *standard* request handler. If it has an
effect if we specify
value for mm like say 90% will it overridethe schema.xml default
operator set up.
3. How does q.alt and q difer in behavior in the above case. I found q.alt
to be giving me the results which I got when I used the standard RH also.
Hence used it.
4. When I make a change to the dismax set up I have in solrconfig.xml I
believe i just have to bounce the SOLR server.Do i need to re-index again
for the change to take effect
5. If I use the dismax how do I see the ANALYSIS feature on the admin
console other wise used for *standard* RH.

Thanks for your patience.

Best Rgds,
Mark.


Re: trimfilterfactory on string fieldtype?

2010-03-21 Thread Ahmet Arslan
>  Can the trim filter factory work on
> string fieldtypes?

No. String field type (solr.StrField) is not analyzed.

You can use charfilter,tokenizer,tokenfilter with solr.TextField.
You can use this (TrimmedString) field definition:



   
   
   




  


Re: related search

2010-03-21 Thread Andrew Greenburg
On Sun, Mar 21, 2010 at 4:30 AM, Suram  wrote:
>
> Thanx lot Ahmet Arslan
>
> How can make query to get synonym value ,any suggestion

You need to apply the SynonymFilterFactory to your queries as well.


use termscomponent like spellComponent ?!

2010-03-21 Thread stocki

hello.

i play with solr but i didn`t find the perfect solution for me.

my goal is a search like the amazonsearch from the iPhoneApp. ;)

it is possible to use the TermsComponent like the SpellComponent ? So, that
works termsComp with more than one single Term ?!  

i got these 3 docs with the name in my index:
 - nikon one
 - nikon two
 - nikon three

so when ich search for "nik" termsCom suggest me  "nikon". thats correctly
whar i want.
but when i type "nikon on" i want that solr suggest me "nikon one" , 

how is that realizable ??? pleeease help me somebody ;) 

a merge of TC nad SC where best solution in think so.

 
this is my searchfield. did i use the correct type ? 


-- 
View this message in context: 
http://old.nabble.com/use-termscomponent-like-spellComponent--%21-tp27977008p27977008.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr crashing while extracting from very simple text file

2010-03-21 Thread Ross
Hi all

I'm trying to import some text files. I'm mostly following Avi
Rappoport's tutorial.  Some of my files cause Solr to crash while
indexing. I've narrowed it down to a very simple example.

I have a file named test.txt with one line. That line is the word
XXBLE and nothing else

This is the command I'm using.

curl 
"http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true";
-F "myfi...@test.txt"

The result is pasted below. Other files work just fine. The problem
seems to be related to the letters B and E. If I change them to
something else or make them lower case then it works. In my real
files, the XX is something else but the result is the same. It's a
common word in the files. I guess for this "quick and dirty" job I'm
doing I could do a bulk replace in the files to make it lower case.

Is there any workaround for this?

Thanks
Ross

Apache Tomcat/6.0.20 - Error
report HTTP Status 500 -
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
... 18 more
Caused by: java.lang.NullPointerException
at java.io.Reader.(Reader.java:78)
at java.io.BufferedReader.(BufferedReader.java:93)
at java.io.BufferedReader.(BufferedReader.java:108)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
... 20 more
type Status
reportmessage
org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handl

Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-21 Thread Ukyo Virgden
Hi,

I'm trying to index HTML documents with different encodings. My html are
either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses
all html in their respective encodings and indexes. However on the web
interface I'm developing I enter query terms in UTF-8 which naturally does
not match with content with different encodings. Also the results I see on
my web app is not utf8 encoded as expected.

My question, is there any filter I can use to convert all content extracted
by the handler to UTF-8 prior to indexing?

Does it make sense to write a filter which would convert tokens to UTF-8, or
even is it possible with multiple encodings?

Thanks in advance.
Ukyo


Facetting with Synonyms

2010-03-21 Thread MitchK

Hello out there,

I got a little problem: 
Users take care about what will be indexed and what not. Sometimes there is
a little problem:
For example: The artists "Snaga & Pillath" are similar to "S & P". When I
Index the document, I can solve this problem with the help of a
SynonymFilter. However, if I want to retrive some facets over a
result-response, there is a little problem: "S&P" and "Snaga & Pillath" both
will be responsed.
Is there a possibility to response only "S&P" OR "Snaga & Pillath"?

I think another example for something like this is "HP" and "Hewlett
Packard". If one user calls the manufacturer of his printer "HP" and another
one says "Hewlett Packard" and you want to do some facetting, there will be
two responsed terms. 

But the truth is: Every HP and every Hewlett Packard facet, as well as every
Snaga & Pillath/S&P facet should facet the same documents.

How would you solve this problem?

Kind regards
- Mitch
-- 
View this message in context: 
http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Boundary match as part of query language?

2010-03-21 Thread David Smiley @MITRE.org

By the way, you'll probably want to shingle or use CommonGrams (with _BEGIN &
_END being "common") for acceptable performance.

I'm wondering, if Lucene's new payload features might provide an alternative
mechanism to mark the first and last term.

~ David Smiley


hossman wrote:
> 
> 
> : Now, I know how to work-around this, by appending some unique character 
> : sequence at each end of the field and then include this in my search in 
> : the front end. However, I wonder if any of you have been planning a 
> : patch to add a native boundary match feature to Solr that would 
> : automagically add tokens (also for multi-value fields!), and expand the 
> : query language to allow querying for starts-with(), ends-with() and 
> : equals()
> 
> well, if you *always* want boundary rules to be applied, that can be done 
> as simply as adding your boundary tokens automaticly in both the index and 
> query time analyzers ... then a search for q="New York" can 
> automaticly be translated into a PhraseQuery for "_BEGIN New York _END"
> 
> If you want special QueryParser markup to specify when you wnat specific 
> boundary conditions that can also be done with a custom QParser, and 
> automaicly applying the boundry tokens in your indexing analyzer (but not 
> the query analyzer -- the QParser would take care of that part)  In 
> general though it's hard to see how something like q=begin(New York) is 
> easier syntax then q="_BEGIN New York"
> 
> THe point is it's realtively easy to implement something like this when 
> meeting specific needs, but i don't know of any working on a truely 
> generalized Qparser that deals with this -- largely because most people 
> who care about this sort of thing either have really complicated use cases 
> (ie: not just begin/end boudnary markers, but also want sentence, 
> paragraph, page, chapter, section, etc...) or want extremely specific 
> query syntax (ie: they're trying to recreate the syntax of an existing 
> system they are replacing) so a general solution doesn't work well.
> 
> The cosest i've ever seen is Mark Miller's QSolr parser, which actually 
> went a completley differnet direction using a home grown syntax to 
> generate Span queries ... if that slacker ever gets off his butt and 
> starts running his webserver again, you could download it and try it out, 
> and probably find that it would be trivial to turn it into a QParser.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Boundary-match-as-part-of-query-language--tp27851560p27976989.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: related search

2010-03-21 Thread Suram

Thanx lot Ahmet Arslan

How can make query to get synonym value ,any suggestion

-- 
View this message in context: 
http://old.nabble.com/related-search-tp27933778p27974649.html
Sent from the Solr - User mailing list archive at Nabble.com.