Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread janne mattila
How did it work before SOLR-811 update? I don't understand. Did it
fetch delta data with two queries (1. gets ids, 2. gets data per each
id) or did it fetch all delta data with a single query?

On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan  wrote:
>> 2. If not - what's the reason delta import is implemented
>> like it is?
>> Why split it in two queries? I would think having a single
>> delta query
>> that fetches the data would be kind of an "obvious" design
>> unless
>> there's something that calls for 2 separate queries...?
>
> I think this is it? https://issues.apache.org/jira/browse/SOLR-811


Re: Auto-complete phrase

2012-03-27 Thread William Bell
I am also very confused at the use case for the Suggester component.
With collate on, it will try to combine random words together not the
actual phrases that are there.

I get better mileage out of EDGE grams and tokenize on whitespace...
Left to right... Since that is how most people think.

However, I would like Suggester to work as follows:

Index:
Chris Smith
Tony Dawson
Chris Leaf
Daddy Golucky

Query:
1. "Chris" it returns "Chris Leaf" but not both Chris Smith and Chris Leaf.
2. I seem to get collated (take first work and combine with second
word). SO I would see things like "Smith Leaf" Very strange and
not what we expect. These are formal names.

When I use Ngrams I can index:

C
Ch
Chr
Chri
Chris
S
Sm
Smi
Smit
Smith

Thus if I search on "Smi" it will match Chris Smith and also Chris
Leaf. Exactly what I want.




On Tue, Mar 27, 2012 at 11:05 AM, Rémy Loubradou  wrote:
> Hello, I am working on creating a auto-complete functionality for my field
> merchant_name present all over my documents. I am using the version 3.4 of
> Solr and I am trying to take advantage of the Suggester functionality.
> Unfortunately so far I didn't figure out how to make it works as  I
> expected.
>
> If my list of merchants present in my documents is:(my real list is bigger
> than the following list, that's the reason why I don't use dictionnary and
> also because it will change often.)
> Redoute
> Suisse Trois
> Conforama
> But
> Cult Beauty
> Brother Trois
>
> I expect from the Suggester component to match words or part of them and
> return phrases where words or part of them have been matched.
> for example with /suggest?q=tro, I would like to get this:
>
> 
> 
> 0
> 0
> 
> 
> 
> 
> 2
> 0
> x
> 
> Bother Trois
> Suisse Trois
> 
> 
> 
> 
> 
>
> I experimented suggestion on a field configured with the tokenizer
> "solr.KeywordTokenizerFactory" or "solr.WhitespaceTokenizerFactory".
> In my mind I have to find a way to handle 3 cases:
> /suggest?q=bo ->(should return) bother trois
> /suggest?q=tro ->(should return) bother trois, suisse trois
> /suggest?q=bo%20tro ->(should return) bother trois
>
> With the "solr.KeywordTokenizerFactory" I get:
> /suggest?q=bo -> bother trois
> /suggest?q=tro -> "nothing"
> /suggest?q=bo%20tro -> "nothing"
>
> With the "solr.WhitespaceTokenizerFactory" I get:
> /suggest?q=bo -> bother
> /suggest?q=troi -> trois
> /suggest?q=bo%20tro -> bother, trois
>
> Not exactly what I want ... :(
>
> My configuration in the file solrconfig.xml for the suggester component:
>
> 
>    
>      suggestMerchant
>      org.apache.solr.spelling.suggest.Suggester
>       name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
>      
>      merchant_name_autocomplete  
>      0.0
>      true
> 
>    
>  
>   name="/suggest/merchant">
>    
>      true
>      suggestMerchant
>      true
>      10
>      true
>      10
>    
>    
>      suggestMerchant
>    
>  
>
> How can I implement autocomplete with the Suggester component to get what I
> expect? Thanks for your help, I really appreciate.



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mike Sokolov

On 3/27/2012 11:14 AM, Mark Miller wrote:

On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:


On 3/26/2012 6:43 PM, Mark Miller wrote:

It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.

It looks like StreamingUpdateSolrServer is not meant for situations where strict error 
checking is required.  I think the documentation should reflect that.  Would you be 
opposed to a javadoc update at the class level (plus a wiki addition) like the following? 
"Because document inserts are handled as background tasks, exceptions and errors 
that occur during those operations will not be available to the calling program, but they 
will be logged.  For example, if the Solr server is down, your program must determine 
this on its own.  If you need strict error handling, use CommonsHttpSolrServer."  If 
my wording is bad, feel free to make suggestions.

It might make sense to accumulate the errors in a fixed-size queue and 
report them either when the queue fills up or when the client commits 
(assuming the commit will wait for all outstanding inserts to complete 
or fail).  This is what we do client-side when performing multi-threaded 
inserts.  Sounds great in theory, I think, but then I haven't delved in 
to SUSS at all ... just a suggestion, take it or leave it.  Actually I 
wonder whether SUSS is necessary of you do the threading client-side?  
You might get a similar perf gain; I know we see a substantial speedup 
that way.  because then your updates spawn multiple threads in the 
server anyway, don't they?


- Mike


Solr with UIMA

2012-03-27 Thread chris3001
I am having a hard time integrating UIMA with Solr. I have downloaded the
Solr 3.5 dist and have it successfully running with nutch and tika on
windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars
from solr/contrib/uima/lib to the working /lib in solr. Next, I read the
readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml
and schema.xml accordingly to no avail. I then found this link which seemed
a bit more applicable since I didnt care to use Alchemy or OpenCalais:
http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still-
when I run a curl command that imports a pdf via solrcell I do not get the
additional UIMA fields nor do I get anything on my logs. The test.pdf is
parsed though and I see the pdf in Solr using:
curl
'http://localhost:8080/solr/update/extract?fmap.content=content&literal.id=doc1&commit=true'
-F "file=@test.pdf"

What I added to my SolrConfig.XML:

/
  

  
  
  C:\web\solrcelluimacrawler\com\rondhuit\uima\desc\KeyphraseExtractAnnotatorDescriptor.xml
  true
  id
  
false

  content

  
  

  com.rondhuit.uima.yahoo.Keyphrase
  
keyphrase
UIMAname
  

  

  
  
  

/
I also adjusted my requestHander:

/

  uima

  /

Finally, my added entries in my Schema.xml

/


/

All I am trying to do is have test *any* UIMA AE in Solr and cannot figure
out what I am doing wrong. Thank you in advance for reading this.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3863324.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Could you please show me how to get those values inside TikaEntityProcessor?

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 2012年3月27日 22:43
To: solr-user@lucene.apache.org
Subject: Re: how to store file path in Solr when using TikaEntityProcessor


> I am using DIH to index local file system. But the file path, size and 
> lastmodified field were not stored. in the schema.xml I defined:
> 
>  
>     indexed="true" stored="true"/>
>     indexed="true" stored="true" />
>    
>     indexed="true" stored="true" />
>     indexed="true" stored="true" />
>     indexed="true" stored="true" />
>  
> 
> 
> And also defined tika-data-config.xml:
> 
> 
>      type="BinFileDataSource" />
>     
>          dataSource="null" rootEntity="false"
>            
> processor="FileListEntityProcessor"
>            
> baseDir="E:/my_project/ecmkit/infotouch"
>            
> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
> onError="skip"
>            
> recursive="true">
>              name="tika-test" dataSource="bin"
> processor="TikaEntityProcessor"
>            
> url="${f.fileAbsolutePath}" format="text"
> onError="skip">
>                
> 
>                
> 
>                
> 
>                
> 
>                
> 
>                
>  />
>             
>         
>     
> 
> 
> 
> The Solr version is 3.5. any idea?

The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified 
are generated by the FileListEntityProcessor. They should be defined above the 
TikaEntityProcessor.  


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Koji Sekiguchi

How does your sequence field look like in schema.xml, fieldType and field?
And what version are you using?

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

(12/03/27 13:06), neosky wrote:

all of my highlights has one character mistake in the offset,some fragments
from my response. Thanks!



0
259


on
sequence

true
10
2.2
*,score
true
0
sequence:NGNFN







TSQSELSNGNFNRRPKIELSNFDGNHPKTWIRKC




GENTRERNGNFNSLTRERSFAELENHPPKVRRNGSEG




EGRYPCNNGNFNLTTGRCVCEKNYVHLIYEDRI




YAEENYINGNFNEEPY




KEVADDCNGNFNQPTGVRI





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
My current version is solr 3.5. It should be the most updated. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unload(true) doesn't delele Index file when unloading a core

2012-03-27 Thread vybe3142
>From what I understand, isn't the index file deletion an expected result?

Thanks


public int drop(, boolean removeIndex) ===> removeIndex passed
in as true
throws Exception {
String coreName = .
Unload req = new Unload(removeIndex);
req.setCoreName(coreName);
SolrServer adminServer = buildAdminServer();
...
return req.process(adminServer).getStatus();  ===> removes
reference to solr core in solr.xml but doesn't delete the index file
}

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3862816.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
Assuming you're just using this field for spellcheck and not for queries, then 
it doesn't matter.  But the correct way to do it is to have it in both places.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, March 27, 2012 3:42 PM
To: solr-user@lucene.apache.org
Subject: RE: preventing words from being indexed in spellcheck dictionary?

hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml






  

  
  
  
  
  
  


  
  
  
  
  

  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml






  

  
  
  
  
  
  


  
  
  
  
  

  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Ahmet Arslan

Can you reproduce the problem with latest trunk? 


> Does anyone know it is a bug or not?
> I use Ngram in my index.
> 
>  class="solr.TextField"
> positionIncrementGap="100">
> 
>  minGramSize="5"
> maxGramSize="5"/>
> 
> 
> 
> 
> 
> 
> 
>  class="solr.TextField"
> positionIncrementGap="100">
> 
>  minGramSize="2"
> maxGramSize="2"/>
> 
> 
> 
> 
> 
> 
> 
> 
> ...
> 
>  indexed="true" stored="true"
> termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
Does anyone know it is a bug or not?
I use Ngram in my index.






















...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: First steps with Solr

2012-03-27 Thread Erik Hatcher
Note that the VelocityResponseWriter puts a tool in the context to escape 
various things.  See the "Velocity Context" section here: 
.  That'll take you to this 


You can do $esc.url($some_variable) to URL encode _pieces_ of a URL.   You can 
see the use of $esc in VM_global_library.vm and some of the other templates 
that ship with Solr.  

Erik


On Mar 27, 2012, at 10:00 , Marcelo Carvalho Fernandes wrote:

> I've had the same problem and my solution was to...
> 
> #set($pName = "#field('name')")
> #set($pName = $pName.trim())
> 
> 
> Marcelo Carvalho Fernandes
> +55 21 8272-7970
> +55 21 2205-2786
> 
> 
> On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net <
> henri.gour...@laposte.net> wrote:
> 
>> trying to play with javascript to clean-up my URL!!
>> Context is velocity
>> 
>> 
>> 
>> Suggestions?
>> Thanks
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Using the ids parameter

2012-03-27 Thread Jamie Johnson
Yes, sorry for the delay, we now do q=key:("key1" "key2"...) and that
works properly.

On Tue, Mar 27, 2012 at 3:53 AM, Dmitry Kan  wrote:
> So I solved it by using key:(id1 OR ... idn).
>
> On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan  wrote:
>
>> Hi,
>>
>> Actually we ran into the same issue with using ids parameter, in the solr
>> front with shards architecture (exception throws in the solr front). Were
>> you able to solve it by using the key:value syntax or some other way?
>>
>> BTW, there was a related issue:
>> https://issues.apache.org/jira/browse/SOLR-1477
>> but it's marked as Won't Fix, does anyone know why it is so, or if this is
>> planned to be resolved?
>>
>> Dmitry
>>
>>
>> On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson  wrote:
>>
>>> We're running into an issue where we are trying to use the ids=
>>> parameter to return a set of documents given their id.  This seems to
>>> work intermittently when running in SolrCloud.  The first question I
>>> have is this something that we should be using or instead should we
>>> doing a query with key:?  The stack trace that I am getting right now
>>> is included below, any thoughts would be appreciated.
>>>
>>> Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
>>> INFO: [slice1_shard1] webapp=/solr path=/select
>>>
>>> params={hl.fragsize=1&ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
>>> status=500 QTime=32
>>> Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
>>> SEVERE: null:java.lang.NullPointerException
>>>  at
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
>>>  at
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
>>>  at
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
>>>  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
>>>  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
>>>  at
>>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
>>>  at
>>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
>>>  at
>>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>>>  at
>>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
>>>  at
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
>>>  at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
>>>  at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
>>>  at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
>>>  at
>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>>  at
>>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>>  at
>>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>>  at
>>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>>  at
>>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>>  at
>>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>>  at
>>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>>  at
>>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>>  at org.mortbay.jetty.Server.handle(Server.java:326)
>>>  at
>>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>>  at
>>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>>  at
>>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>>>  at
>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>>
>>
>>
>>
>
>
> --
> Regards,
>
> Dmitry Kan


Auto-complete phrase

2012-03-27 Thread Rémy Loubradou
Hello, I am working on creating a auto-complete functionality for my field
merchant_name present all over my documents. I am using the version 3.4 of
Solr and I am trying to take advantage of the Suggester functionality.
Unfortunately so far I didn't figure out how to make it works as  I
expected.

If my list of merchants present in my documents is:(my real list is bigger
than the following list, that's the reason why I don't use dictionnary and
also because it will change often.)
Redoute
Suisse Trois
Conforama
But
Cult Beauty
Brother Trois

I expect from the Suggester component to match words or part of them and
return phrases where words or part of them have been matched.
for example with /suggest?q=tro, I would like to get this:



0
0




2
0
x

Bother Trois
Suisse Trois






I experimented suggestion on a field configured with the tokenizer
"solr.KeywordTokenizerFactory" or "solr.WhitespaceTokenizerFactory".
In my mind I have to find a way to handle 3 cases:
/suggest?q=bo ->(should return) bother trois
/suggest?q=tro ->(should return) bother trois, suisse trois
/suggest?q=bo%20tro ->(should return) bother trois

With the "solr.KeywordTokenizerFactory" I get:
/suggest?q=bo -> bother trois
/suggest?q=tro -> "nothing"
/suggest?q=bo%20tro -> "nothing"

With the "solr.WhitespaceTokenizerFactory" I get:
/suggest?q=bo -> bother
/suggest?q=troi -> trois
/suggest?q=bo%20tro -> bother, trois

Not exactly what I want ... :(

My configuration in the file solrconfig.xml for the suggester component:



  suggestMerchant
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.FSTLookup
  
  merchant_name_autocomplete  
  0.0
  true


  
  

  true
  suggestMerchant
  true
  10
  true
  10


  suggestMerchant

  

How can I implement autocomplete with the Suggester component to get what I
expect? Thanks for your help, I really appreciate.


Re: First steps with Solr

2012-03-27 Thread Marcelo Carvalho Fernandes
I've had the same problem and my solution was to...

#set($pName = "#field('name')")
#set($pName = $pName.trim())


Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net <
henri.gour...@laposte.net> wrote:

> trying to play with javascript to clean-up my URL!!
> Context is velocity
>
>
>
> Suggestions?
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud with Tomcat and external Zookeeper, does it work?

2012-03-27 Thread jerry.min...@gmail.com
Hi Vadim,

I too am experimenting with SolrCloud and need help with setting it up
using Tomcat as the java servlet container.
While searching for help on this question, I found another thread in
the solr-mailing-list that is helpful.
In case you haven't seen this thread that I found, please search the
solr-mailing-list for: "SolrCloud new"
You can also view it at nabble using this link:
http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html

Best,
Jerry M.




On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann
 wrote:
>
> Hello folks,
>
> i read the SolrCloud Wiki and Bruno Dumon's blog entry with his "First
> Exploration of SolrCloud".
> Examples and a first setup with embedded Jetty and ZK WORKS without problems.
>
> I tried to setup my own configuration with Tomcat and an external
> Zookeeper(my Master-ZK), but it doesn't work really.
>
> My setup:
> - latest Solr version from trunk
> - Tomcat 6
> - external ZK
> - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with
> different config/schema
>
> What i tried:
> --
> 1. After checkout i build solr(ant run-example), it works.
> ---
> 2. I send my config/schema files to external ZK with Jetty:
> java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/
> -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar
> it works, too.
> ---
> 3. I create my ("empty, without cores")solr.xml, like Bruno:
> http://www.ngdata.com/site/blog/57-ng.html#disqus_thread
> ---
> 4. I started my Tomcat, and get the first error:
> in UI: This interface requires that you activate the admin request
> handlers, add the following configuration to your solrconfig.xml:
> 
> 
> Admin request Handlers are definitely activated in my solrconfig.
>
> I get this error only with the latest trunk versions, with r1292064
> from February not. Sometimes it works with the new version, sometimes
> not and i get this error.
>
> --
> 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for
> Tomcat and added this: "-DzkHost=master-zk:2181"
> Next Error:
> This The web application [/solr2] appears to have started a thread
> named [main-SendThread(master-zk:2181)] but has failed to stop it.
> This is very likely to create a memory leak.
> Exception in thread "Thread-2" java.lang.NullPointerException
> at 
> org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179)
> at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104)
> at java.lang.Thread.run(Thread.java:662)
> 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass
> INFO: Illegal access: this web application instance has been stopped
> already. Could not load org.apache.zookeeper.server.ZooTrace. The
> eventual following stack trace is caused by an error thrown for
> debugging purposes as well as to attempt to terminate the thread which
> caused the illegal access, and has no functional impact.
> java.lang.IllegalStateException
> at 
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
> at 
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196)
> 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy
>
> -
> 6. Ok, we assume, that the first steps works, and i would create new
> cores and my 2 collections. My requests with CoreAdminHandler are ok,
> my solr.xml looks like this:
> 
> 
>   hostContext="solr">
>           name="shard1_data"
>       collection="col1"
>       shard="shard1"
>       instanceDir="xxx/" />
>         name="shard2_data"
>       collection="col2"
>       shard="shard2"
>       instanceDir="xx2/" />
>  
> 
>
> Now i get the following exception: "...couldn't find conf name for
> collection1..."
> I don't have an collection 1. Why this exception?
>
> ---
> You can see, there are too many exceptions and eventually
> configuration problems with Tomcat and an external ZK.
> Has anyone set up an "identical" configuration and does it work?
> Does anyone detect mistakes in my configuration steps?
>
> Best regards
> Vadim


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
thank you very much for the info ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Erick Erickson
https://issues.apache.org/jira/browse/SOLR-445

This JIRA reflects the slightly different case of wanting better
reporting of *which* document failed in a multi-document packet, it
doesn't specifically address SUSS. But it might serve to give you some
ideas if you tackle this.

On Tue, Mar 27, 2012 at 11:14 AM, Mark Miller  wrote:
>
> On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:
>
>> On 3/26/2012 6:43 PM, Mark Miller wrote:
>>> It doesn't get thrown because that logic needs to continue - you don't 
>>> necessarily want one bad document to stop all the following documents from 
>>> being added. So the exception is sent to that method with the idea that you 
>>> can override and do what you would like. I've written sample code around 
>>> stopping and throwing an exception, but I guess its not totally trivial. 
>>> Other ideas for reporting errors have been thrown around in the past, but 
>>> no work on it has gotten any traction.
>>
>> It looks like StreamingUpdateSolrServer is not meant for situations where 
>> strict error checking is required.  I think the documentation should reflect 
>> that.  Would you be opposed to a javadoc update at the class level (plus a 
>> wiki addition) like the following? "Because document inserts are handled as 
>> background tasks, exceptions and errors that occur during those operations 
>> will not be available to the calling program, but they will be logged.  For 
>> example, if the Solr server is down, your program must determine this on its 
>> own.  If you need strict error handling, use CommonsHttpSolrServer."  If my 
>> wording is bad, feel free to make suggestions.
>>
>> If I'm wrong and you do have an example of an error handling override that 
>> would do what I need, I would love to see it.  From what I can tell, add 
>> requests are pushed down and handled by Runner threads, completely 
>> disconnected from the request.  The response to add calls always seems to be 
>> a NOTE element saying "the request is processed in a background stream", 
>> even if successful.
>>
>> Thanks,
>> Shawn
>>
>
>
> I'm not saying what it's meant for, I'm just saying what it is. Currently, 
> the only thing you can do to check for errors is override that method. I 
> understand it's still somewhat limiting - it depends on your use case how 
> well it can work. For example, I've know people that just want to stop the 
> update process if a doc fails, and throw an exception. You can write code to 
> do that by extending the class and overriding handleError. You can also 
> collection the exceptions, count the fails, read and parse any error 
> messages, etc. It doesn't help you with an ID or anything though - unless you 
> get unluck/lucky and can parse it out of error messages (if it's even in 
> them). It might be more useful if you could set the name of an id field for 
> it to look for and perhaps also dump to that method.
>
> Their have been previous conversations about improving error reporting for 
> this SolrServer, but no work has ever really gotten off the ground. There may 
> be existing JIRA issues around this topic - certainly there are previous 
> email threads.
>
> All and all though, please, make all the suggestions and JIRA issues you 
> want. Javadoc improvements can be submitted as patches through JIRA as well. 
> Also, the Wiki is open to anyone to update.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mark Miller

On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:

> On 3/26/2012 6:43 PM, Mark Miller wrote:
>> It doesn't get thrown because that logic needs to continue - you don't 
>> necessarily want one bad document to stop all the following documents from 
>> being added. So the exception is sent to that method with the idea that you 
>> can override and do what you would like. I've written sample code around 
>> stopping and throwing an exception, but I guess its not totally trivial. 
>> Other ideas for reporting errors have been thrown around in the past, but no 
>> work on it has gotten any traction.
> 
> It looks like StreamingUpdateSolrServer is not meant for situations where 
> strict error checking is required.  I think the documentation should reflect 
> that.  Would you be opposed to a javadoc update at the class level (plus a 
> wiki addition) like the following? "Because document inserts are handled as 
> background tasks, exceptions and errors that occur during those operations 
> will not be available to the calling program, but they will be logged.  For 
> example, if the Solr server is down, your program must determine this on its 
> own.  If you need strict error handling, use CommonsHttpSolrServer."  If my 
> wording is bad, feel free to make suggestions.
> 
> If I'm wrong and you do have an example of an error handling override that 
> would do what I need, I would love to see it.  From what I can tell, add 
> requests are pushed down and handled by Runner threads, completely 
> disconnected from the request.  The response to add calls always seems to be 
> a NOTE element saying "the request is processed in a background stream", even 
> if successful.
> 
> Thanks,
> Shawn
> 


I'm not saying what it's meant for, I'm just saying what it is. Currently, the 
only thing you can do to check for errors is override that method. I understand 
it's still somewhat limiting - it depends on your use case how well it can 
work. For example, I've know people that just want to stop the update process 
if a doc fails, and throw an exception. You can write code to do that by 
extending the class and overriding handleError. You can also collection the 
exceptions, count the fails, read and parse any error messages, etc. It doesn't 
help you with an ID or anything though - unless you get unluck/lucky and can 
parse it out of error messages (if it's even in them). It might be more useful 
if you could set the name of an id field for it to look for and perhaps also 
dump to that method.

Their have been previous conversations about improving error reporting for this 
SolrServer, but no work has ever really gotten off the ground. There may be 
existing JIRA issues around this topic - certainly there are previous email 
threads.

All and all though, please, make all the suggestions and JIRA issues you want. 
Javadoc improvements can be submitted as patches through JIRA as well. Also, 
the Wiki is open to anyone to update. 

- Mark Miller
lucidimagination.com













RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
If the list of words isn't very long, you can add a StopFilter to the analysis 
for "itemDescSpell" and put the words you don't want in the stop list.  If you 
want to prevent low-occuring words from being sued as corrections, use the 
"thresholdTokenFrequency" in your spellcheck configuration.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, March 27, 2012 9:07 AM
To: solr-user@lucene.apache.org
Subject: preventing words from being indexed in spellcheck dictionary?

hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml


  default
  itemDescSpell
  true
  spellchecker_mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread Dyer, James
It might be easier to know what's going on if you provide some snippets from 
solrconfig.xml and schema.xml.  But my guess is that in your solrconfig.xml, 
under the spellcheck "searchComponent" either the "queryAnalyzerFieldType" or 
the "fieldType" (one level down) is set to a field that is removing numbers or 
otherwise modifying the tokens on analysis.  The reason is that your query 
contained "ccc" but it says that "1" is a misspelled word in your query.  
Typically you want a simple analysis chain that just tokenizes on whitespace 
and little else for spellchecking.

With that said, I wouldn't be surprised if this was a bug as we've had problems 
in the past with words containing numbers, dashes and the like.  If you become 
convinced you've found a bug, would you be able to write a failing unit test 
and post it on JIRA?  See http://wiki.apache.org/solr/HowToContribute for more 
information.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: tom [mailto:dev.tom.men...@gmx.net] 
Sent: Tuesday, March 27, 2012 2:31 AM
To: solr-user@lucene.apache.org
Subject: Re: possible spellcheck bug in 3.5 causing erroneous suggestions

so any one has a clue what's (might be) going wrong ?

or do i have to debug and myself and post a jira issue?

PS: unfortunately i cant give anyone the index for testing due to NDA.

cheers

On 22.03.2012 10:17, tom wrote:
> same
>
> On 22.03.2012 10:00, Markus Jelsma wrote:
>> Can you try spellcheck.q ?
>>
>>
>> On Thu, 22 Mar 2012 09:57:19 +0100, tom  wrote:
>>> hi folks,
>>>
>>> i think i found a bug in the spellchecker but am not quite sure:
>>> this is the query i send to solr:
>>>
>>> http://lh:8983/solr/CompleteIndex/select?
>>> &rows=0
>>> &echoParams=all
>>> &spellcheck=true
>>> &spellcheck.onlyMorePopular=true
>>> &spellcheck.extendedResults=no
>>> &q=a+bb+ccc++
>>>
>>> and this is the result:
>>>
>>> 
>>> 
>>> 
>>> 0
>>> 4
>>> 
>>> all
>>> true
>>> all
>>> no
>>> a bb ccc 
>>> 0
>>> true
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 1
>>> 2
>>> 4
>>> 
>>> abb
>>> 
>>> 
>>> 
>>> 1
>>> 5
>>> 8
>>> 
>>> ccc
>>> 
>>> 
>>> 
>>> 1
>>> 5
>>> 8
>>> 
>>> ccc
>>> 
>>> 
>>> 
>>> 1
>>> 10
>>> 14
>>> 
>>> dvd
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>> now, i know  this is just a technical query and i have done it for a
>>> test regarding suggestions and i discovered the oddity just by chance
>>> and was not regarding the test i did:
>>> my question is regarding, how the suggestions 1 and 2 come
>>> about. from what i understand from the wiki, that the entries in
>>> spellcheck/suggestions are only (misspelled) substrings from the user
>>> query.
>>>
>>> the setup/context is thus:
>>> - the words a ccc exists 11 times in the index but 1 and 2 dont
>>>
>>>
>>> http://lh:8983/solr/CompleteIndex/terms?terms=on&terms.fl=spell&terms.prefix=ccc&terms.mincount=0
>>>  
>>>
>>>
>>>
>>> 0>> name="QTime">1>> name="ccc">11
>>> -  analyzer for the spellchecker yields the terms as entered, i.e.
>>> a|bb|ccc|
>>> -  the config is thus
>>>
>>> 
>>>
>>> textSpell
>>>
>>> 
>>> default
>>> spell
>>> ./spellchecker
>>> 
>>> 
>>>
>>>
>>> does anyone have a clue what's going on?
>>
>>
>



Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Shawn Heisey

On 3/26/2012 6:43 PM, Mark Miller wrote:

It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.


It looks like StreamingUpdateSolrServer is not meant for situations 
where strict error checking is required.  I think the documentation 
should reflect that.  Would you be opposed to a javadoc update at the 
class level (plus a wiki addition) like the following? "Because document 
inserts are handled as background tasks, exceptions and errors that 
occur during those operations will not be available to the calling 
program, but they will be logged.  For example, if the Solr server is 
down, your program must determine this on its own.  If you need strict 
error handling, use CommonsHttpSolrServer."  If my wording is bad, feel 
free to make suggestions.


If I'm wrong and you do have an example of an error handling override 
that would do what I need, I would love to see it.  From what I can 
tell, add requests are pushed down and handled by Runner threads, 
completely disconnected from the request.  The response to add calls 
always seems to be a NOTE element saying "the request is processed in a 
background stream", even if successful.


Thanks,
Shawn



Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread Ahmet Arslan
> 2. If not - what's the reason delta import is implemented
> like it is?
> Why split it in two queries? I would think having a single
> delta query
> that fetches the data would be kind of an "obvious" design
> unless
> there's something that calls for 2 separate queries...?

I think this is it? https://issues.apache.org/jira/browse/SOLR-811


Re: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread Ahmet Arslan

> I am using DIH to index local file system. But the file
> path, size and lastmodified field were not stored. in the
> schema.xml I defined:
> 
>  
>     indexed="true" stored="true"/>
>     indexed="true" stored="true" />
>    
>     indexed="true" stored="true" />
>     indexed="true" stored="true" />
>     indexed="true" stored="true" />
>  
> 
> 
> And also defined tika-data-config.xml:
> 
> 
>      type="BinFileDataSource" />
>     
>          dataSource="null" rootEntity="false"
>            
> processor="FileListEntityProcessor"
>            
> baseDir="E:/my_project/ecmkit/infotouch"
>            
> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
> onError="skip"
>            
> recursive="true">
>              name="tika-test" dataSource="bin"
> processor="TikaEntityProcessor"
>            
> url="${f.fileAbsolutePath}" format="text"
> onError="skip">
>                
> 
>                
> 
>                
> 
>                
> 
>                
> 
>                
>  />
>             
>         
>     
> 
> 
> 
> The Solr version is 3.5. any idea?

The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified 
are generated by the FileListEntityProcessor. They should be defined above the 
TikaEntityProcessor.


dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread janne mattila
It seems that delta import works in 2 steps, first query fetches the
ids of the modified entries, then second query fetches the actual
data.









I am aware that there's a workaround:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

But still, to clarify, and make sure I have up-to-date info how Solr works:

1. Is it possible to fetch the modified data with a single SQL query
using deltaImportQuery, as in:

deltaImportQuery="select * from item where last_modified >
'${dataimporter.last_index_time}'"?

2. If not - what's the reason delta import is implemented like it is?
Why split it in two queries? I would think having a single delta query
that fetches the data would be kind of an "obvious" design unless
there's something that calls for 2 separate queries...?


preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml


  default
  itemDescSpell
  true
  spellchecker_mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cores issue

2012-03-27 Thread Erick Erickson
It might be administratively easier to have multiple webapps, but
it shouldn't really matter as far as I know...

Best
Erick

On Tue, Mar 27, 2012 at 12:22 AM, Sujatha Arun  wrote:
> yes ,I must have mis-copied and yes, i do have the conf folder per core
> with schema etc ...
>
> Because of this issue ,we have decided to have multiple webapps with about
> 50 cores per webapp  ,instead of one singe webapp with all 200 cores ,would
> this make better sense ?
>
> what would be your suggestion?
>
> Regards
> Sujatha
>
> On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson 
> wrote:
>
>> Shouldn't be. What do your log files say? You have to treat each
>> core as a separate index. In other words, you need to have a core#/conf
>> with the schema matching your core#/data/index directory etc.
>>
>> I suspect you've simply mis-copied something.
>>
>> Best
>> Erick
>>
>> On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun  wrote:
>> > I was migrating to cores from webapp ,and I was copying a bunch of
>> indexes
>> > from webapps to respective cores ,when I restarted ,I had this issue
>> where
>> > the whole webapp with the cores would not startup and was getting index
>> > corrupted message..
>> >
>> > In this scenario or in a scenario where there is an issue with schema
>> > /config file for one core ,will the whole webapp with the cores not
>> restart?
>> >
>> > Regards
>> > Sujatha
>> >
>> > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson > >wrote:
>> >
>> >> Index corruption is very rare, can you provide more details how you
>> >> got into that state?
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun 
>> wrote:
>> >> > Hello,
>> >> >
>> >> > Suppose  I have several cores in a single webapp ,I have issue with
>> Index
>> >> > beong corrupted in one core  ,or schema /solrconfig of one core is not
>> >> well
>> >> > formed ,then entire webapp refused to load on server restart?
>> >> >
>> >> > Why does this happen?
>> >> >
>> >> > Regards
>> >> > Sujatha
>> >>
>>


Re: document inside document?

2012-03-27 Thread Erick Erickson
For your tagging, think about using multiValued="true" with
an increment gap of, say, 100. Then your searches
on this field can be phrase queries with a smaller slop
e.g. "tall woman"~90 would match, but "purse gucci"~90
would not because "purse" and "gucci" are not within 90
tokens of each other.

As far as the metadata is concerned, this is just specifying
which fields should be queried, see the "qf" parameter
in edismax.

As far as fieldType, spend some time with admin/analysis to understand
the kinds that various tokenizers and filters do, your question is really
too broad to answer. I'd start with one of the text types and iterate.

Grouping on primary key is a pretty useless thing to do, what is your
use case?

And you'll just have to get used to denormalizing data with Solr/Lucene,
which is hard for a DB person, it just feels icky ..

Best
Erick

On Mon, Mar 26, 2012 at 3:00 PM, sam ”  wrote:
> Hey,
>
> I am making an image search engine where people can tag images with various
> items that are themselves tagged.
> For example, http://example.com/abc.jpg is tagged with the following three
> items:
> - item1 that is tagged with: tall blond woman
> - item2 that is tagged with: yellow purse
> - item3 that is tagged with: gucci red dress
>
> Querying for +yellow +purse  will return the example image. But, querying
> for +gucci +purse will not because the image does not have an item tagged
> with both gucci and purse.
>
> In addition to "items", each image has various metadata such as alt text,
> location, description, photo credit.. etc  that should be available for
> search.
>
> How should I write my schema.xml ?
> If imageUrl is primary key, do I implement my own fieldType for items, so
> that I can write:
> 
> What would myItemType look like so that solr would know the example image
> will not be part of the query, +gucci +purse??
>
> If itemId is primary key, I can use result grouping (
> http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt
> text and other image metadata for each item.
>
> Or, should I create different schema for item search and metadata search?
>
> Thanks.
> Sam.


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mark Miller
Like I said, you have to extend the class and override the error method. 

Sent from my iPhone

On Mar 27, 2012, at 2:29 AM, Shawn Heisey  wrote:

> On 3/26/2012 10:25 PM, Shawn Heisey wrote:
>> The problem is that I currently have no way (that I know of so far) to 
>> detect that a problem happened.  As far as my code is concerned, everything 
>> worked, so it updates my position tracking and those documents will never be 
>> inserted.  I have not yet delved into the response object to see whether it 
>> can tell me anything.  My code currently assumes that if no exception was 
>> thrown, it was successful.  This works with CHSS.  I will write some test 
>> code that tries out various error situations and see what the response 
>> contains.
> 
> I've written some test code.  When doing an add with SUSS against a server 
> that's down, no exception is thrown.  It does throw one for query and 
> deleteByQuery.  When doing the add test with CHSS, an exception is thrown.  I 
> guess I'll just have to use CHSS until this gets fixed, assuming it ever 
> does.  Would it be at all helpful to file an issue in jira, or has one 
> already been filed?  With a quick search, I could not find one.
> 
> Thanks,
> Shawn
> 


CLOSE_WAIT connections

2012-03-27 Thread Bernd Fehling

Hi list,

I have looked into the CLOSE_WAIT problem and created an issue with a patch to 
fix this.
A search for CLOSE_WAIT shows that there are many Apache projects hit by this 
problem.

https://issues.apache.org/jira/browse/SOLR-3280

Can someone recheck the patch (it belongs to SnapPuller) and give the OK for 
release?
The patch is against branch_3x (3.6).


Regards
Bernd


Re: Client-side failover with SolrJ

2012-03-27 Thread darul
I rediscover the world every day, thanks for this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Client-side-failover-with-SolrJ-tp3858461p3860700.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Hi,

I am using DIH to index local file system. But the file path, size and 
lastmodified field were not stored. in the schema.xml I defined:

 
   
   
   
   
   
   
 


And also defined tika-data-config.xml:


















The Solr version is 3.5. any idea?

Thanks in advance.



Liang


Re: Using the ids parameter

2012-03-27 Thread Dmitry Kan
So I solved it by using key:(id1 OR ... idn).

On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan  wrote:

> Hi,
>
> Actually we ran into the same issue with using ids parameter, in the solr
> front with shards architecture (exception throws in the solr front). Were
> you able to solve it by using the key:value syntax or some other way?
>
> BTW, there was a related issue:
> https://issues.apache.org/jira/browse/SOLR-1477
> but it's marked as Won't Fix, does anyone know why it is so, or if this is
> planned to be resolved?
>
> Dmitry
>
>
> On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson  wrote:
>
>> We're running into an issue where we are trying to use the ids=
>> parameter to return a set of documents given their id.  This seems to
>> work intermittently when running in SolrCloud.  The first question I
>> have is this something that we should be using or instead should we
>> doing a query with key:?  The stack trace that I am getting right now
>> is included below, any thoughts would be appreciated.
>>
>> Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
>> INFO: [slice1_shard1] webapp=/solr path=/select
>>
>> params={hl.fragsize=1&ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
>> status=500 QTime=32
>> Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
>> SEVERE: null:java.lang.NullPointerException
>>  at
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
>>  at
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
>>  at
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
>>  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
>>  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
>>  at
>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
>>  at
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
>>  at
>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
>>  at
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
>>  at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
>>  at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
>>  at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
>>  at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>  at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>  at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>  at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>  at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>  at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>  at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>  at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>  at org.mortbay.jetty.Server.handle(Server.java:326)
>>  at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>  at
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>  at
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>>  at
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>
>
>
>


-- 
Regards,

Dmitry Kan


Re: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread tom

so any one has a clue what's (might be) going wrong ?

or do i have to debug and myself and post a jira issue?

PS: unfortunately i cant give anyone the index for testing due to NDA.

cheers

On 22.03.2012 10:17, tom wrote:

same

On 22.03.2012 10:00, Markus Jelsma wrote:

Can you try spellcheck.q ?


On Thu, 22 Mar 2012 09:57:19 +0100, tom  wrote:

hi folks,

i think i found a bug in the spellchecker but am not quite sure:
this is the query i send to solr:

http://lh:8983/solr/CompleteIndex/select?
&rows=0
&echoParams=all
&spellcheck=true
&spellcheck.onlyMorePopular=true
&spellcheck.extendedResults=no
&q=a+bb+ccc++

and this is the result:




0
4

all
true
all
no
a bb ccc 
0
true






1
2
4

abb



1
5
8

ccc



1
5
8

ccc



1
10
14

dvd






now, i know  this is just a technical query and i have done it for a
test regarding suggestions and i discovered the oddity just by chance
and was not regarding the test i did:
my question is regarding, how the suggestions 1 and 2 come
about. from what i understand from the wiki, that the entries in
spellcheck/suggestions are only (misspelled) substrings from the user
query.

the setup/context is thus:
- the words a ccc exists 11 times in the index but 1 and 2 dont


http://lh:8983/solr/CompleteIndex/terms?terms=on&terms.fl=spell&terms.prefix=ccc&terms.mincount=0 




0111
-  analyzer for the spellchecker yields the terms as entered, i.e.
a|bb|ccc|
-  the config is thus



textSpell


default
spell
./spellchecker




does anyone have a clue what's going on?