locks in solr

2009-11-25 Thread Rakhi Khatwani
Hi,
  Is there any article which explains the locks in solr??
there is some info on solrconfig.txt which says that you can set the lock
type to none(NoLockFactory), single(SingleInstanceLockFactory),
NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we
create a new file.
suppose my index dir has the following files:
_2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii,
_2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis,
_2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s,
_2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen

1.)   I assume for each of these files there is a lock. please correct me if
i am wrong.
2.) what are the different lock types in terms of read/write/updates?
3.) Can we have a document level locking scheme?
4.) we would like to know the best way to handle multiple simulataneous
writes to the index
Thanks a ton,
Raakhi


Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Jörg Agatz
Hallo, at the moment i tryed to create a Solr instance wite more then one
Cores

I use solr 1.4 and multicore Runs :-)
But i dont know how i post a XML in one of my cores. At the Moment i use

"java -jar post.jar *.xml"

now i will fill the core0 index with core0*.xml , and core1 with core1*.xml
But how?
in the wiki i cant find anythink about that.

King


Re: Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
try this
java -Durl=http://localhost:8983/solr/core0/update -jar post.jar *.xml


On Wed, Nov 25, 2009 at 3:23 PM, Jörg Agatz  wrote:
> Hallo, at the moment i tryed to create a Solr instance wite more then one
> Cores
>
> I use solr 1.4 and multicore Runs :-)
> But i dont know how i post a XML in one of my cores. At the Moment i use
>
> "java -jar post.jar *.xml"
>
> now i will fill the core0 index with core0*.xml , and core1 with core1*.xml
> But how?
> in the wiki i cant find anythink about that.
>
> King
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Jörg Agatz
Thanks, it works realy fine..

Maby you have an Ideo, to search in Core0 and Core1

I want to search in all cores, or only in 2 of 3 cores.


Sending Tika parse result to Solr

2009-11-25 Thread Daniel Knapp
Hello,


i want to send the Tika parse results of my data to my Solr-Server.
My File-Server is not my Solr-Server, so Solr Cell is no option for me.

In Lucene i can pass my Reader Object (as an result of the parsing) to a Lucene 
Document for indexing.

Is this also possible with Solr? Or is there an other or better way to do this?
I'm using SolrJ for the connection.


Regards,
Daniel 

smime.p7s
Description: S/MIME cryptographic signature


Buggy search Solr1.4 Multicore

2009-11-25 Thread Jörg Agatz
Hi...

I have a Problem with Solr, I try it with 3 cores, and it starts. I can
search but i only become results, when i exaktly search for the howls field.

i mean, in the field stand: "Dell Widescreen Ultra"

when i search for
"name:Widescreen" i get Nothing
"name:Dell Widescreen Ultra" i get the file
"name:Dell*" i get the file

now i creat copyfiels and search only for "Dell*" and get it, bit
"Widescreen" Nothing

What is wrong with the index?

I will search each word in each field!

Pleas Help me


Re: Buggy search Solr1.4 Multicore

2009-11-25 Thread Rafał Kuć
Hello!

> Hi...

> I have a Problem with Solr, I try it with 3 cores, and it starts. I can
> search but i only become results, when i exaktly search for the howls field.

> i mean, in the field stand: "Dell Widescreen Ultra"

> when i search for
> "name:Widescreen" i get Nothing
> "name:Dell Widescreen Ultra" i get the file
> "name:Dell*" i get the file

> now i creat copyfiels and search only for "Dell*" and get it, bit
> "Widescreen" Nothing

> What is wrong with the index?

> I will search each word in each field!

> Pleas Help me

I assume your name field type is string right ? If it is right, than
change it to text, it should work as You would like.

-- 
Regards,
 Rafał Kuć



Help on this parsed query

2009-11-25 Thread revas
I have the text analyzer defined as follows










































when i search on this field  name simple of the above field type , the term
peRsonal

*I expect it to search as   simple:personal   simple :pe simple:rsonal*



instead the parsed query string says

*simple:peRsonal*
 * * *simple:peRsonal*
 * * *MultiPhraseQuery(simple:"(person pe)
rsonal")*
 * * *simple:"(person pe) rsonal"*


what is this multiphrase query ,why is this a phrase query istead of simple
query?


Regards
Revas


Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
Hollo,

I try to search in more than one Core.

I search in Wiki, but i dont find any way to search in 2 of the 3 cores and
a way to seacht in all cores.

maby Someone of you have tryed the same an can help me?


Re: Sending Tika parse result to Solr

2009-11-25 Thread Grant Ingersoll

On Nov 25, 2009, at 5:32 AM, Daniel Knapp wrote:

> Hello,
> 
> 
> i want to send the Tika parse results of my data to my Solr-Server.
> My File-Server is not my Solr-Server, so Solr Cell is no option for me.
> 
> In Lucene i can pass my Reader Object (as an result of the parsing) to a 
> Lucene Document for indexing.
> 
> Is this also possible with Solr? Or is there an other or better way to do 
> this?
> I'm using SolrJ for the connection.

You can't pass your reader object, but I have opened 
https://issues.apache.org/jira/browse/SOLR-1526 to provide a SolrJ client side 
equivalent of Solr Cell.  If you'd like to contribute a patch that would be 
great.   Basically, you just need to have your Handler override create a 
SolrInputDocument (batches, that is) and then send them to Solr.  Using the 
Streaming server may also fit well with this model.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: how is score computed with hsin functionquery?

2009-11-25 Thread Grant Ingersoll

On Nov 24, 2009, at 6:22 PM, gdeconto wrote:

> 
> 
> gdeconto wrote:
>> 
>> ...
>> is there some way to convert the hsin value to distance?
>> ...
>> 
> 
> I just noticed that the solr wiki states "Values must be in Radians" and all
> my test values were in degrees.

Yep.  Also note that I added deg() and rad() functions, but for the most part 
is probably better to do the conversion during indexing.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: how to do partial word searches?

2009-11-25 Thread Joel Nylund

Hi Erick,

thanks for the links, I read both of them and I still have no idea  
what to do, lots of back and forth, but didn't see any solution on it.


One person talked about indexing the field in reverse and doing and ON  
on it, this might work I guess.


thanks
Joel


On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:


copying from Eric Hatcher:

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

There's a pretty extensive recent exchange on this, see the
thread on the user's list titled

"leading and trailing wildcard query"Best
Erick

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund   
wrote:



Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find  
partial word

matches within the title.

For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
the daily dish | by andrew sullivan

but it doesnt, it does find sully (which is fine with me also as a  
bonus),
but doesnt seem to get any of the partial word stuff. Oddly enough  
before I
lowercased the title, the wildcard matching seemed to work a bit  
better, it

just didnt deal with the case sensitive query.

At first I had mixed case titles and I read that the wildcard  
doesn't work
with mixed case, so I created another field that is a lowered  
version of the

title called "textTitle", it is of type text.

Is it possible with solr to achieve what I am trying to do, if so  
how? If

not, anything closer than what I have?

thanks
Joel






Re: Implementing phrase autopop up

2009-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 11:58 PM, darniz  wrote:

>
>
> i created a filed as same as the lucid blog says.
>
>  omitNorms="true" omitTermFreqAndPositions="true"/>
>
> with the following field configurtion
>
>  positionIncrementGap="100">
> −
> 
> 
> 
>  maxGramSize="25"/>
> 
> −
> 
> 
> 
> 
> 
>
> Now when i query i get the correct phrases for example if search for
> autocomp:"how to" i get all the correct phrases like
>
> How to find a car
> How to find a mechanic
> How to choose the right insurance company
>
> etc... which is good.
>
> Now I have two question.
> 1) Is it necessary to give the query in quote. My gut feeling is yes, since
> if you dont give quote i get phrases beginning with How followed by some
> other words like How can etc...
>

Yes since we want to do phrase searches on n-grams



> 2)if i search for word for example choose, it gives me nothing
> I was expecting to see a result considering there is a word "choose" in the
> phrase
> How to choose the right insurance company
>
> i might look more at documentation but do you have anything to advice.
>
>
EdgeNgram creates n-grams from the starting or the ending edge therefore you
can't match words in the middle of a phrase. Try using NGramFilterFactory
instead.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Buggy search Solr1.4 Multicore

2009-11-25 Thread Erick Erickson
If Rafal's response doesn't help (but it's sure where I'd
look first, it sounds like you're using a field type that's
not Tokenized), then could you post the relevant parts
of your config file that define the field and the analyzers
used at *both* query and index time?

Best
Erick

On Wed, Nov 25, 2009 at 6:35 AM, Jörg Agatz wrote:

> Hi...
>
> I have a Problem with Solr, I try it with 3 cores, and it starts. I can
> search but i only become results, when i exaktly search for the howls
> field.
>
> i mean, in the field stand: "Dell Widescreen Ultra"
>
> when i search for
> "name:Widescreen" i get Nothing
> "name:Dell Widescreen Ultra" i get the file
> "name:Dell*" i get the file
>
> now i creat copyfiels and search only for "Dell*" and get it, bit
> "Widescreen" Nothing
>
> What is wrong with the index?
>
> I will search each word in each field!
>
> Pleas Help me
>


Re: Help on this parsed query

2009-11-25 Thread Erick Erickson
I think because if it wasn't a phrase query you'd be matching on
the broken up parts of the word *wherever* they were in your field.
e.g. "pe" and "rsonal" could be separated by any number of other
tokens and you'd get a match.

HTH
Erick

P.S. I was a bit confused by your asterisks, it took me a while to figure
out that you'd added them by hand for emphasis and you weren't sending
wildcards through..

On Wed, Nov 25, 2009 at 6:43 AM, revas  wrote:

> I have the text analyzer defined as follows
>
>
> 
>
> 
>
> 
>
> 
> ignoreCase="true"
>
> words="stopwords.txt"
>
> enablePositionIncrements="true"
>
> />
>
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  ignoreCase="true" expand="true"/>
>
>  words="stopwords.txt"/>
>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
>
> when i search on this field  name simple of the above field type , the term
> peRsonal
>
> *I expect it to search as   simple:personal   simple :pe simple:rsonal*
>
>
>
> instead the parsed query string says
>
> *simple:peRsonal*
>  * * *simple:peRsonal*
>  * * *MultiPhraseQuery(simple:"(person pe)
> rsonal")*
>  * * *simple:"(person pe) rsonal"*
>
>
> what is this multiphrase query ,why is this a phrase query istead of simple
> query?
>
>
> Regards
> Revas
>


Re: SolrPlugin Guidance

2009-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
laurent.vauth...@disney.com> wrote:

>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries.  The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1.   Decompose query A into query B and query C
>
> 2.   Send query B to all shards and plug query B's results into
> query C
>
> 3.   Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent.  Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse.  So I figured I
> should step back and get advice from the mailing list now J.  What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.

-- 
Regards,
Shalin Shekhar Mangar.


Re: locks in solr

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 3:05 PM, Rakhi Khatwani  wrote:

> Hi,
>  Is there any article which explains the locks in solr??
> there is some info on solrconfig.txt which says that you can set the lock
> type to none(NoLockFactory), single(SingleInstanceLockFactory),
> NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime
> we
> create a new file.
> suppose my index dir has the following files:
> _2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii,
> _2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis,
> _2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s,
> _2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen
>
> 1.)   I assume for each of these files there is a lock. please correct me
> if
> i am wrong.
>

No. The index directory has one lock. Individual files are not locked
separately.


> 2.) what are the different lock types in terms of read/write/updates?
>

Locks are only used for preventing more than one IndexWriters (or Solr
instances/cores) writing to the same index. They do not prevent reads. They
also do not prevent multiple writes from the same Solr core (there is some
synchronization but it has nothing to do with these locks)


> 3.) Can we have a document level locking scheme?
>

No. I think you have grossly misunderstood the purpose of locks in Solr.


> 4.) we would like to know the best way to handle multiple simulataneous
> writes to the index
>

With one Solr instance, you can do writes concurrently without a problem.

-- 
Regards,
Shalin Shekhar Mangar.


Re: why is XMLWriter declared as final?

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell  wrote:

> Is there any reason the XMLWriter is declared as final? I'd like to extend
> it for a special case but can't. The other writers (ruby, php, json) are
> not
> final.
>
>
I don't think it needs to be final. Maybe it is final because it wasn't
designed to be extensible. Please open a jira issue.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 5:39 PM, Jörg Agatz wrote:

> Hollo,
>
> I try to search in more than one Core.
>
> I search in Wiki, but i dont find any way to search in 2 of the 3 cores and
> a way to seacht in all cores.
>
> maby Someone of you have tryed the same an can help me?
>

You need to provide urls of the cores in the distributed search request. It
will make HTTP calls to the specified cores but there is no way around that
right now.

http://wiki.apache.org/solr/DistributedSearch

Why do you want to search across cores on the same Solr?

-- 
Regards,
Shalin Shekhar Mangar.


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-25 Thread javaxmlsoapdev

Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param("content", "') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.

let me know if you need anything else. Appreciate your help.

Thanks,

javaxmlsoapdev wrote:
> 
> Following is luke response.  is empty. can someone
> assist to find out why file content isn't being index?
> 
>
>  
>  
>   0 
>   0 
>   
>  
>   0 
>   0 
>   0 
>   1259085661332 
>   false 
>   true 
>   false 
>name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
>  
>   2009-11-24T18:01:01Z 
>   
>
>  
>  
>   Indexed 
>   Tokenized 
>   Stored 
>   Multivalued 
>   TermVector Stored 
>   Store Offset With TermVector 
>   Store Position With TermVector 
>   Omit Norms 
>   Lazy 
>   Binary 
>   Compressed 
>   Sort Missing First 
>   Sort Missing Last 
>   
>   Document Frequency (df) is not updated when a document
> is marked for deletion. df values include deleted documents. 
>   
>   
> 
> javaxmlsoapdev wrote:
>> 
>> I was able to configure /docs index separately from my db data index.
>> 
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new
>> schema)
>> 
>> below are the only two fields I have in schema.xml
>> > required="true" /> 
>> > multiValued="true"/>  
>> 
>> Following is updated code from test case
>> 
>> File fileToIndex = new File("file.txt");
>> 
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>  
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>> 
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - 
>> - 
>> - 
>>   702 
>>   text/plain 
>>   doc123.txt 
>>
>>   
>>   8978 
>>   
>>   
>> 
>> Any idea?
>> 
>> Thanks,
>> 
>> 
>> javaxmlsoapdev wrote:
>>> 
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>> 
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index
>>> separately from database entity index. what's the best way to do this?
>>> If I don't have any db columns etc to map and file documents index
>>> should leave separate from db entity index, what's the best way to
>>> achieve this.
>>> 
>>> thanks,
>>> 
>>> 
>>> 
>>> Grant Ingersoll-6 wrote:
 
 
 On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
 
> 
> *:* returns me 1 count but when I search for specific word (which was
> part of
> .txt file I indexed before) it doesn't return me anything. I don't
> have luke
> setup on my end.
 
 http://localhost:8983/solr/admin/luke should give yo some info.
 
 
> let me see if I can set that up quickly but otherwise do
> you see anything I am missing in solrconfig mapping or something?
 
 What's your schema look like and how are you querying?
 
> which maps
> document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file
>>> (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
>>> server.request(up); 
>>> 
>>> test case doesn't give me any error and "I think" its indexing the
>>> file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:*
>> query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, j

Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
> Why do you want to search across cores on the same Solr?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

I only need Multiindexing, but i find no other way to import other indizes.

I have some old indexes from a other Projekt, and will us this in Solr. i i
use one index it works, but i have a lot of index, so i need to find a way
search in more then one index, so more then one Cores


Re: how to do partial word searches?

2009-11-25 Thread Erick Erickson
Confession: I haven't had occasion to use the ngram thingy, but here's the
theory
And note that SOLR has n-gram tokenizers available..

Using a 2-gram example for sullivan, the n-gram would index these tokens...
su, ul, ll, li, iv, va, an. Then at query time in your example, sulli would
be
broken up into su, ul, ll and li. Which, when searched as a phrase
would turn match your field.

The expense, of course is that your index is larger (but surprisingly not as
much as you'd think). But your queries are much faster.

That's the theory anyway, the practice is "left as an exercise for the
reader"

But "the folks" generously provided quite an explication of what wildcards
are
all about on the *lucene* user's list, look for a thread titled
"I just don't get wildcards at all" from around 2006. It's a nice background
for
what the underlying problem is, some of the SOLR tokenizers are realizing
some of this I think. And the state of the art has progressed considerably
since then, but the underlying issues are still there...

Sorry I can't be more help here..
Erick

On Wed, Nov 25, 2009 at 8:18 AM, Joel Nylund  wrote:

> Hi Erick,
>
> thanks for the links, I read both of them and I still have no idea what to
> do, lots of back and forth, but didn't see any solution on it.
>
> One person talked about indexing the field in reverse and doing and ON on
> it, this might work I guess.
>
> thanks
> Joel
>
>
>
> On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:
>
>  copying from Eric Hatcher:
>>
>> See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
>> does not have leading wildcard support enabled.
>>
>> There's a pretty extensive recent exchange on this, see the
>> thread on the user's list titled
>>
>> "leading and trailing wildcard query"Best
>> Erick
>>
>> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund  wrote:
>>
>>  Hi, I saw some older postings on this, but didnt see a resolution.
>>>
>>> I have a field called title, I would like to be able to find partial word
>>> matches within the title.
>>>
>>> For example:
>>>
>>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>>
>>> I would expect it to find:
>>> the daily dish | by andrew sullivan
>>>
>>> but it doesnt, it does find sully (which is fine with me also as a
>>> bonus),
>>> but doesnt seem to get any of the partial word stuff. Oddly enough before
>>> I
>>> lowercased the title, the wildcard matching seemed to work a bit better,
>>> it
>>> just didnt deal with the case sensitive query.
>>>
>>> At first I had mixed case titles and I read that the wildcard doesn't
>>> work
>>> with mixed case, so I created another field that is a lowered version of
>>> the
>>> title called "textTitle", it is of type text.
>>>
>>> Is it possible with solr to achieve what I am trying to do, if so how? If
>>> not, anything closer than what I have?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>


Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Erick Erickson
Would it help to combine the indexes into one big one? There sound reasons
NOT to do this, but if it's a possibility, have you seen the info here?

http://wiki.apache.org/solr/MergingSolrIndexes

Best
Erick

On Wed, Nov 25, 2009 at 9:00 AM, Jörg Agatz wrote:

> > Why do you want to search across cores on the same Solr?
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
> I only need Multiindexing, but i find no other way to import other indizes.
>
> I have some old indexes from a other Projekt, and will us this in Solr. i i
> use one index it works, but i have a lot of index, so i need to find a way
> search in more then one index, so more then one Cores
>


Re: how to do partial word searches?

2009-11-25 Thread Robert Muir
Hi, if you are using Solr 1.4 I think you might want to try type text_rev
(look in the example schema.xml)

unless i am mistaken:

this will enable leading wildcard support for that field.
this doesn't do any stemming, which I think might be making your wildcards
behave wierd.
it also enables reverse wildcard support, so some of your substring matches
will be faster.

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund  wrote:

> Hi, I saw some older postings on this, but didnt see a resolution.
>
> I have a field called title, I would like to be able to find partial word
> matches within the title.
>
> For example:
>
> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>
> I would expect it to find:
> the daily dish | by andrew sullivan
>
> but it doesnt, it does find sully (which is fine with me also as a bonus),
> but doesnt seem to get any of the partial word stuff. Oddly enough before I
> lowercased the title, the wildcard matching seemed to work a bit better, it
> just didnt deal with the case sensitive query.
>
> At first I had mixed case titles and I read that the wildcard doesn't work
> with mixed case, so I created another field that is a lowered version of the
> title called "textTitle", it is of type text.
>
> Is it possible with solr to achieve what I am trying to do, if so how? If
> not, anything closer than what I have?
>
> thanks
> Joel
>
>


-- 
Robert Muir
rcm...@gmail.com


Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
OK thanks Shalin.

Matt

On Wed, Nov 25, 2009 at 8:48 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell 
> wrote:
>
> > Is there any reason the XMLWriter is declared as final? I'd like to
> extend
> > it for a special case but can't. The other writers (ruby, php, json) are
> > not
> > final.
> >
> >
> I don't think it needs to be final. Maybe it is final because it wasn't
> designed to be extensible. Please open a jira issue.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


RE: Index Splitter

2009-11-25 Thread Giovanni Fernandez-Kincade
You can't really use this if you have an optimized index, right?

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Tuesday, November 24, 2009 6:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Splitter

Giovanni Fernandez-Kincade wrote:
> Hi,
> I've heard about a tool that can be used to split Lucene indexes, for cases 
> where you want to break up a large index into shards. Do you know where I can 
> find it? Any observations/recommendations about its use?
>
> This seems promising but I'm not sure if there is anything more mature out 
> there:
> http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html
>
> Thanks,
> Gio.
>
>   
There are IndexSplitter and MultiPassIndexSplitter tools in 3.0.

https://issues.apache.org/jira/browse/LUCENE-1959

I'd written an article about them before:

http://lucene.jugem.jp/?eid=344

It is Japanese but I think you can read out how to use them from command 
lines...

Koji

-- 
http://www.rondhuit.com/en/



Re: Index Splitter

2009-11-25 Thread Koji Sekiguchi

Giovanni Fernandez-Kincade wrote:

You can't really use this if you have an optimized index, right?

  

For optimized index, I think you can use MultiPassIndexSplitter.

Koji

--
http://www.rondhuit.com/en/



Re: how is score computed with hsin functionquery?

2009-11-25 Thread gdeconto


Grant Ingersoll-6 wrote:
> 
> ...
> Yep.  Also note that I added deg() and rad() functions, but for the most
> part is probably better to do the conversion during indexing.
> ...
> 

Thanks Grant.  I hadnt seen the deg and rad functions.  Conversion would be
difficult since I typically work with degrees.  Once I get a bit more
experienced with the solr code, maybe I can contribute a degree version of
hsin  :-)
-- 
View this message in context: 
http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26515157.html
Sent from the Solr - User mailing list archive at Nabble.com.



Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in
tomcat.sh

POI, PDFBox, Tika and related jars are under
/home/solr_1_4_0/apache-solr-1.4.0/lib

When I try to index files using SolrJ API as follow, I don't see content of
the file being indexed. It only indexes file size (bytes) and file/type into
"content" field. See below schema defintion as well.
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(file);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);

schema.xml has following
  
  

content

And solrconfig.xml has


  content
  content

  

Luke response is as below, which displays correct count (7) of indexed
documents but no "content" in the index. in tomcat logs I don't see any
errors or anything. Unless I am going blind with something I don't see
anything missing in setting things up. Can anyone advise. Do I need to
include tika jars in tomcat's deployed solr/lib or unde /example/lib in
SOLR_HOME?

   
- 
- 
  0 
  28 
  
- 
  7 
  7 
  25 
  1259164190261 
  false 
  true 
  false 
  org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index
 
  2009-11-25T15:50:03Z 
  
- 
- 
  text 
  ITSM-- 
  ITS-- 
  7 
  18 
- 
  3 
  3 
  3 
  3 
  2 
  2 
  1 
  1 
  1 
  1 
  
- 
  12 
  2 
  4 
  
  
- 
  slong 
  I-SO-l 
  I-SO- 
  7 
  7 
- 
  1 
  1 
  1 
  1 
  1 
  1 
  1 
  
- 
  7 
  
  
  
- 
- 
  Indexed 
  Tokenized 
  Stored 
  Multivalued 
  TermVector Stored 
  Store Offset With TermVector 
  Store Position With TermVector 
  Omit Norms 
  Lazy 
  Binary 
  Compressed 
  Sort Missing First 
  Sort Missing Last 
  
  Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents. 
  
  
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html
Sent from the Solr - User mailing list archive at Nabble.com.



Looking for Best Practices: Analyzers vs. UpdateRequestProcessors?

2009-11-25 Thread Andreas Kahl
Hello, 

are there any general criteria when to use Analyzers to implement an indexing 
function and when it is better to use UpdateRequestProcessors? 

The main difference I found in the documentation was that 
UpdateRequestProcessors are able to manipulate several fields at once (create, 
read, update, delete), while Analyzers operate on the contents of a single 
field at once. 

Is that correct so far? Are there more experiences helping to decide which type 
of module to use implementing indexing modules? Are there differences in 
processing performance? Is one of the two APIs easier to learn/debug etc?

If you have any Best Practices with that I would be very interested to hear 
about those. 

Andreas

P.S. My experience with Search Engines is mainly with FAST where one uses 
Stages in a Pipeline no matter which feature to implement. 


Re: Index Splitter

2009-11-25 Thread Andrzej Bialecki

Koji Sekiguchi wrote:

Giovanni Fernandez-Kincade wrote:

You can't really use this if you have an optimized index, right?

  

For optimized index, I think you can use MultiPassIndexSplitter.


Correct - MultiPassIndexSplitter can handle any index - optimized or 
not, with or without deletions, etc. The cost for this flexibility is 
that it needs to read index files multiple times (hence "multi-pass").




--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
I think NO, because there is a Crawler for fulltext indexig that
permernently uptate the Indexes

When you have a Crawler for documents,office ect, than i can switch to solr
totaly.


Re: Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

g. I had to include tika and related parsing jars into
tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
apologies for all the noise. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
Sent from the Solr - User mailing list archive at Nabble.com.



Batch file upload using solrJ API

2009-11-25 Thread javaxmlsoapdev

Is there an API to upload files over one connection versus looping through
all the files and creating new ContentStreamUpdateRequest for each file.
This, as expected, doesn't work if there are large number of files and
quickly run into memory problems. Please advise.

Thanks,


-- 
View this message in context: 
http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26518167.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: converting over from sphinx

2009-11-25 Thread Chris Hostetter

: way.  In particular, I'm doing phrase searching into a corpus of
: descriptions, such as "I need help with a foo" where I have a bunch of "foo:
: a foo is a subset of a bar often used to create briznatzes", etc.
: 
: With Sphinx, I could convert "I need help with a foo" into "*need* *help*
: *with* *foo*" and get pretty nice matches. With Solr, my understanding is
: that you can only do wildcard matches on the suffix. In addition, stemming
: only happens on non-wildcard terms. So, my first thought would be to convert
: "I need help with a foo" into "need need* help help* with with* foo foo*".

First off, we need to make sure we have all our terminology in sync -- i'm 
not very familiar with Sphinx, so i'm not sure what types of vernacular 
are used there to describe various things, but in Solr/Lucene you have 
options regarding how you want text to be "analyzed" when it's indexed -- 
this analysis is what converts an arbitrary stream of characters into 
"Terms" that get indexed.  at query time, it's very easy to match on 
terms, or boolean combinations of terms, and sequential phrases of terms 
-- you only need wildcard type functionality if you want to provide a 
wildcard expression that could match more then one individual term.

In your specific example, if you just configured a basic wildcard 
tokenizer when you indexed your documents (ie: "foo: a foo is a subset of 
a bar often used to create briznatzes") then at query time any of the 
individual words ("foo", "bar", etc...) would match that document.  
likewise a phrase query like "need help with foo" would match that text if 
you defined some stop words (like "need" and "with") and specified a small 
amount of slop on your phrase queries.


The point is: there are a lot of differnet ways to use Solr, and the 
terminology you are use to with Sphinx may not map exactly to some of the 
terminology you'll see in the SOlr docs/configs -- so please feel free to 
ask.

-Hoss



Re: error with multicore CREATE action

2009-11-25 Thread Chris Hostetter
: 
: > Are there any use cases for CREATE where the instance directory
: > *doesn't* yet exist? I ask because I've noticed that Solr will create
: > an instance directory for me sometimes with the CREATE command. In
...
: I guess when you try to add documents and an IndexWriter is opened, the data
: directory is created if it does not exist. Since it calls File#mkdirs, all
: parent directories are also created. I don't think Solr creates those
: directories by itself.

Shalin: I'm confused, wasn't this one of the original use cases for the 
CREATE command as part of the "LotsOfCores" work you and Noble have been 
pushing forward?  I thought one of the goals was that a user could have a 
single solrconfig.xml+schema.xml on disk somewhere, and then at run time 
use the CREATE command to caused many, many new cores to be created (each 
with a new/unqiue instanceDir).

If that isn't intended (and therefor: not handled well) then we should 
probably make the CREATE command test for the existence of the specified 
instanceDir and error if it doesn't already exist -- otherwise a typo in 
an instanceDir file path could lead to some really unexpected behavior.



-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Chris Hostetter

: I don't think it needs to be final. Maybe it is final because it wasn't
: designed to be extensible. Please open a jira issue.

it really wasn't, and it probably shouldn't be ... there is another thread 
currently in progress (in response to SOLR-1592) about this.

Given how kludgy the entire API is, i'd really prefer it not be made 
un-final .. it would need some serious overhaul/review to make it possible 
to subclass in a sensical way, and coming up with a new API is likely to 
make a lot more sense then trying to retrofit that one.

-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Mattmann, Chris A (388J)
Hey Hoss,

+1. I think we need to overhaul the whole API, even in light of the incremental 
progress I've been proposing and patching, etc., lately.

I think it's good to do that incrementally, though, rather than all at once, 
especially considering SOLR is in 1.5-dev trunk stage atm.

Cheers,
Chris

On 11/25/09 11:33 AM, "Chris Hostetter"  wrote:



: I don't think it needs to be final. Maybe it is final because it wasn't
: designed to be extensible. Please open a jira issue.

it really wasn't, and it probably shouldn't be ... there is another thread
currently in progress (in response to SOLR-1592) about this.

Given how kludgy the entire API is, i'd really prefer it not be made
un-final .. it would need some serious overhaul/review to make it possible
to subclass in a sensical way, and coming up with a new API is likely to
make a lot more sense then trying to retrofit that one.

-Hoss



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: [SolrResourceLoader] Unable to load cached class-name

2009-11-25 Thread Chris Hostetter
: 
: I've deployed the contents of dist/ into JBoss's lib directory for the
: server I'm running and I've also copied the contents of lib/ into

Please be specific ... what is "dist/" what is "lib/" ? ... are you 
talking about the top level dist and lib directories in a solr release, 
then those should *not* be copied into any directory for JBoss.  
everything you need to access core solr features is available in wht 
solr.war -- that is all you need to run the solr application.

the only reason to ever copy any jars arround when dealing with solr is to 
load plugins (ie: your own, or things counts in the contrib directory of a 
solr release) and even then they should go in the special "lib" directory 
inside your Solr HOme directory so they are loaded by the appropraite 
classlaoder -- not in the top level class loader of your servlet 
container.

: [SolrResourceLoader] Unable to load cached class-name :
: org.apache.solr.search.FastLRUCache for shortname :
: solr.FastLRUCachejava.lang.ClassNotFoundException:
: org.apache.solr.search.FastLRUCache

this is most likely because you have duplicate copies of (all of) the solr 
classes at various classloader levels -- the copies in the solr.war, and 
the copies you've put into the JBoss lib dir.  having both can cause 
problems like this because of the rules involved with 
hierarchical classloaders.



-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
Interesting. Well just to clarify my intentions a bit, I'll quickly explain
what I was trying to do.

I'm using the MLT component but because some of my stored fields are really
big, I don't need (or want) all of the fields for my MLT docs in the
response. I want my MLT docs to have only 2 fields, but I need my main docs
fl to have all fields.

So a simple override of the XMLWriter writeNamedList method would do the
trick. All you have to do is check if the name == "moreLikeThis". If so,
process the docs and specify a different field list. If not, just call
super(). Worked like a charm, but oh well. I really only need the Ruby
response anyway, so I'll move on to that. I'm glad this spurred some
interest though.

-- It'd be great to let components have control over their fl value instead
of having a global fl value for all doc lists within a writer?

Matt

On Wed, Nov 25, 2009 at 2:33 PM, Chris Hostetter
wrote:

>
> : I don't think it needs to be final. Maybe it is final because it wasn't
> : designed to be extensible. Please open a jira issue.
>
> it really wasn't, and it probably shouldn't be ... there is another thread
> currently in progress (in response to SOLR-1592) about this.
>
> Given how kludgy the entire API is, i'd really prefer it not be made
> un-final .. it would need some serious overhaul/review to make it possible
> to subclass in a sensical way, and coming up with a new API is likely to
> make a lot more sense then trying to retrofit that one.
>
> -Hoss
>
>


Re: locks in solr

2009-11-25 Thread Chris Hostetter
:   Is there any article which explains the locks in solr??
: there is some info on solrconfig.txt which says that you can set the lock
: type to none(NoLockFactory), single(SingleInstanceLockFactory),
: NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we
: create a new file.

FYI: That's not at all what the SimpleFSLockFactory does.

Index locking is a pretty low level Level concept -- there isn't really 
anything Solr specific about it.  90% of all Solr users shouldn't need to 
worry about it, ever.  The only time it becomes an issue is if you are 
planning on doing something extremeley advanced dealing with the Lucene 
index files directly.

if that's the case: your best bet is to read the Locking code and APIs in 
Lucene, and ask your questions on the java-us...@lucene mailing list.



-Hoss



Re: error with multicore CREATE action

2009-11-25 Thread Shalin Shekhar Mangar
On Thu, Nov 26, 2009 at 12:43 AM, Chris Hostetter
wrote:

> :
> : > Are there any use cases for CREATE where the instance directory
> : > *doesn't* yet exist? I ask because I've noticed that Solr will create
> : > an instance directory for me sometimes with the CREATE command. In
> ...
> : I guess when you try to add documents and an IndexWriter is opened, the
> data
> : directory is created if it does not exist. Since it calls File#mkdirs,
> all
> : parent directories are also created. I don't think Solr creates those
> : directories by itself.
>
> Shalin: I'm confused, wasn't this one of the original use cases for the
> CREATE command as part of the "LotsOfCores" work you and Noble have been
> pushing forward?  I thought one of the goals was that a user could have a
> single solrconfig.xml+schema.xml on disk somewhere, and then at run time
> use the CREATE command to caused many, many new cores to be created (each
> with a new/unqiue instanceDir).
>
>
Yes, that is correct but those changes are not in trunk right now. We're
planning to spend some time in the next few weeks in splitting that big
patch into smaller ones, adding tests and pushing them into trunk.
LotsOfCores still needs LotsOfWork :)

-- 
Regards,
Shalin Shekhar Mangar.


Re: PatternTokenizer question

2009-11-25 Thread Chris Hostetter

: I think the answer to my question is contained in the wiki when discussing
: the SynonymFilter, "The Lucene QueryParser tokenizes on white space before
: giving any text to the Analyzer".  This would indeed explain what I am
: getting.  Next question - can I avoid that behavior?

it's the nature of the lucene query parser -- whitespace is a "meta 
character" that provides instructions to the parse, just like '+', '\', 
'"', etc...

you could always use a quoted string (so the parser treats all of your 
input as one phrase) or try the "field" QParser (which is essentailly the 
same thing as using a quoted phrase but doesn't require the quotes, or 
respect any of the other escape characters)



-Hoss



Re: PatternTokenizer question

2009-11-25 Thread Walter Underwood
I think you are looking for a "free text" query parser. Essentially, it would 
use the analyzer chain, but then turn the token stream into a query tree. Very 
useful if you are searching for a band named "+/-".

wunder

On Nov 25, 2009, at 12:00 PM, Chris Hostetter wrote:

> 
> : I think the answer to my question is contained in the wiki when discussing
> : the SynonymFilter, "The Lucene QueryParser tokenizes on white space before
> : giving any text to the Analyzer".  This would indeed explain what I am
> : getting.  Next question - can I avoid that behavior?
> 
> it's the nature of the lucene query parser -- whitespace is a "meta 
> character" that provides instructions to the parse, just like '+', '\', 
> '"', etc...
> 
> you could always use a quoted string (so the parser treats all of your 
> input as one phrase) or try the "field" QParser (which is essentailly the 
> same thing as using a quoted phrase but doesn't require the quotes, or 
> respect any of the other escape characters)
> 
> 
> 
> -Hoss
> 



solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund
Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank page, if I telnet too the server/port it just disconnects.


I tried editing the scripts.conf to change the hostname, that didnt  
seem to help.


For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel



param "version" and diferences in /admin/ping response

2009-11-25 Thread Nestor Oviedo
Hi everyone!
Can anyone tell me what's the meaning of the param "version" ?? There
isn't anything about it in the Solr documentation.

When I invoke the /admin/ping url, if the version value is between 0
and 2.1, the response looks like this:



0
5

all
10
all
solrpingquery
standard
2.1


OK


And when the version value is anything different from that range, the
response looks like this:



0
4

all
10
all
solrpingquery
standard


OK


Tanks.
Regards
Nestor Oviedo


Re: param "version" and diferences in /admin/ping response

2009-11-25 Thread Chris Hostetter
: Hi everyone!
: Can anyone tell me what's the meaning of the param "version" ?? There
: isn't anything about it in the Solr documentation.

http://wiki.apache.org/solr/XMLResponseFormat#A.27version.27

-Hoss



Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread simon
first, check what port 8983 is bound to - should be listening on all
interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*   LISTEN

-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund  wrote:

> Hi, if I try to use any other hostname jetty doesnt work, gives a blank
> page, if I telnet too the server/port it just disconnects.
>
> I tried editing the scripts.conf to change the hostname, that didnt seem to
> help.
>
> For example I tried editing my etc/hosts file and added:
>
> 127.0.0.1 solriscool
>
> then:
> ping solriscool
> PING solriscool (127.0.0.1): 56 data bytes
> 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms
>
>
> sh-3.2# telnet solriscool 8983
> Trying 127.0.0.1...
> Connected to solriscool.
> Escape character is '^]'.
> GET / HTTP/1.1
> Connection closed by foreign host.
>
>
> telnet localhost 8983
> Trying ::1...
> Connected to localhost.
> Escape character is '^]'.
> GET /solr HTTP/1.1
> Host: localhost
>
> HTTP/1.1 302 Found
> Location: http://localhost/solr/
> Content-Length: 0
> Server: Jetty(6.1.3)
>
>
> any ideas?
>
> thanks
> Joel
>
>


Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN


thanks
Joel

On Nov 25, 2009, at 5:21 PM, simon wrote:


first, check what port 8983 is bound to - should be listening on all
interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund   
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that didnt  
seem to

help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel






Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread simon
On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund  wrote:

> I see:
>
> tcp46  0  0  *.8983 *.*LISTEN
> tcp4   0  0  127.0.0.1.8983 *.*LISTEN
>

Not the same version of linux/netstat as mine, but I'd guess that the
second  line is the key to the problem -looks as though TCP over IPv4 is onl
y listening on the localhost interface, which is a network configuration
issue.

what does the Solr log say after it's started - should be a line

 INFO:  Started SelectChannelConnector @ 0.0.0.0:8983


-Simon


> thanks
> Joel
>
>
> On Nov 25, 2009, at 5:21 PM, simon wrote:
>
>  first, check what port 8983 is bound to - should be listening on all
>> interfaces
>>
>> netstat -an |grep 8983
>>
>> You should see
>>
>> tcp0  0 0.0.0.0:8983  0.0.0.0:*   LISTEN
>>
>> -Simon
>>
>> On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund  wrote:
>>
>>  Hi, if I try to use any other hostname jetty doesnt work, gives a blank
>>> page, if I telnet too the server/port it just disconnects.
>>>
>>> I tried editing the scripts.conf to change the hostname, that didnt seem
>>> to
>>> help.
>>>
>>> For example I tried editing my etc/hosts file and added:
>>>
>>> 127.0.0.1 solriscool
>>>
>>> then:
>>> ping solriscool
>>> PING solriscool (127.0.0.1): 56 data bytes
>>> 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
>>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms
>>>
>>>
>>> sh-3.2# telnet solriscool 8983
>>> Trying 127.0.0.1...
>>> Connected to solriscool.
>>> Escape character is '^]'.
>>> GET / HTTP/1.1
>>> Connection closed by foreign host.
>>>
>>>
>>> telnet localhost 8983
>>> Trying ::1...
>>> Connected to localhost.
>>> Escape character is '^]'.
>>> GET /solr HTTP/1.1
>>> Host: localhost
>>>
>>> HTTP/1.1 302 Found
>>> Location: http://localhost/solr/
>>> Content-Length: 0
>>> Server: Jetty(6.1.3)
>>>
>>>
>>> any ideas?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>


Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

yes says:

2009-11-25 18:08:59.967::INFO:  Started SocketConnector @ 0.0.0.0:8983

running on osx

thanks
Joel


On Nov 25, 2009, at 6:00 PM, simon wrote:

On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund   
wrote:



I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN




Not the same version of linux/netstat as mine, but I'd guess that the
second  line is the key to the problem -looks as though TCP over  
IPv4 is onl
y listening on the localhost interface, which is a network  
configuration

issue.

what does the Solr log say after it's started - should be a line

INFO:  Started SelectChannelConnector @ 0.0.0.0:8983


-Simon



thanks
Joel


On Nov 25, 2009, at 5:21 PM, simon wrote:

first, check what port 8983 is bound to - should be listening on all

interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund   
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that  
didnt seem

to
help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel









Re: Deduplication in 1.4

2009-11-25 Thread KaktuChakarabati

Hey Otis,
Yep, I realized this myself after playing some with the dedupe feature
yesterday.
So it does look like Field collapsing is what I need pretty much.
Any idea on how close it is to being production-ready?

Thanks,
-Chak

Otis Gospodnetic wrote:
> 
> Hi,
> 
> As far as I know, the point of deduplication in Solr (
> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate
> document before indexing it in order to avoid duplicates in the index in
> the first place.
> 
> What you are describing is closer to field collapsing patch in SOLR-236.
> 
>  Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: KaktuChakarabati 
>> To: solr-user@lucene.apache.org
>> Sent: Tue, November 24, 2009 5:29:00 PM
>> Subject: Deduplication in 1.4
>> 
>> 
>> Hey,
>> I've been trying to find some documentation on using this feature in 1.4
>> but
>> Wiki page is alittle sparse..
>> In specific, here's what i'm trying to do:
>> 
>> I have a field, say 'duplicate_group_id' that i'll populate based on some
>> offline documents deduplication process I have.
>> 
>> All I want is for solr to compute a 'duplicate_signature' field based on
>> this one at update time, so that when i search for documents later, all
>> documents with same original 'duplicate_group_id' value will be rolled up
>> (e.g i'll just get the first one that came back  according to relevancy).
>> 
>> I enabled the deduplication processor and put it into updater, but i'm
>> not
>> seeing any difference in returned results (i.e results with same
>> duplicate_id are returned separately..)
>> 
>> is there anything i need to supply in query-time for this to take effect?
>> what should be the behaviour? is there any working example of this?
>> 
>> Anything will be helpful..
>> 
>> Thanks,
>> Chak
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html
Sent from the Solr - User mailing list archive at Nabble.com.



Date ranges for indexes constructed outside Solr

2009-11-25 Thread Phil Hagelberg

I'm working on an application that will build indexes directly using the
Lucene API, but will expose them to clients using Solr. I'm seeing
plenty of documentation on how to support date range fields in Solr,
but they all assume that you are inserting documents through Solr rather
than merging already-generated indexes.

Where can I find details about the Lucene-level field operations that
can be used to generate date fields that Solr will work with? In
particular date resolution settings are unclear.

On a similar note: how much of schema.xml is relevant in cases where
Solr is not performing insertions? Obviously defaultSearchField is as
well as the solrQueryParser defaultOperator attribute, but it seems like
most of the field declarations might not matter.

thanks,
Phil


Re: Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread Juan Pedro Danculovic
HI! does your example finally works? I index the data with solrj and I have
the same problem and could not retrieve file data.


On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev  wrote:

>
> g. I had to include tika and related parsing jars into
> tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
> apologies for all the noise.
>
> Thanks,
> --
> View this message in context:
> http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-25 Thread Rahul R
Hello,
Would really appreciate any inputs/suggestions on this. Thank you.



On Tue, Nov 24, 2009 at 10:59 PM, Rahul R  wrote:

> Hello,
> In our application we have a catch-all field (the 'text' field) which is
> cofigured as the default search field. Now this field will have a
> combination of numbers, alphabets, special characters etc. I have a
> requirement wherein the WordDelimiterFilterFactory does not work on numbers,
> especially those with decimal points. Accuracy of results with relevance to
> numerical data is quite important, So if the text field of a document has
> data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
> "355" or "35.5" does not retrieve this document. So I found the following
> setting for the WordDelimiterFilterFactory to work for me (for most parts):
>  generateNumberParts="0" catenateWords="1" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1"/>
>
> I am using the same setting for both index and query.
>
> Now the only problem is, if I have data like ".355". With the above
> setting, the analysis jsp shows me that WordDelimiterFilterFactory is
> creating term texts as both ".355' and "355". So a search for ".355"
> retrieves documents containing both ".355" and "355". A search for "355"
> also has the same effect. I noticed that when the entry for the
> WordDelimiterFilterFactory was completely removed (both index and query),
> then the above problem was resolved. But this seems too harsh a measure.
>
> Is there a way by which I can prevent the WordDelimiterFilterFactory from
> totally acting on numerical data ?
>
> Regards
> Rahul
>