Re: Using the ids parameter

2012-03-27 Thread Dmitry Kan
Hi,

Actually we ran into the same issue with using ids parameter, in the solr
front with shards architecture (exception throws in the solr front). Were
you able to solve it by using the key:value syntax or some other way?

BTW, there was a related issue:
https://issues.apache.org/jira/browse/SOLR-1477
but it's marked as Won't Fix, does anyone know why it is so, or if this is
planned to be resolved?

Dmitry

On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote:

 We're running into an issue where we are trying to use the ids=
 parameter to return a set of documents given their id.  This seems to
 work intermittently when running in SolrCloud.  The first question I
 have is this something that we should be using or instead should we
 doing a query with key:?  The stack trace that I am getting right now
 is included below, any thoughts would be appreciated.

 Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
 INFO: [slice1_shard1] webapp=/solr path=/select

 params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
 status=500 QTime=32
 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.NullPointerException
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
  at
 org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
  at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
  at
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
  at
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
  at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
  at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
  at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Shawn Heisey

On 3/26/2012 10:25 PM, Shawn Heisey wrote:
The problem is that I currently have no way (that I know of so far) to 
detect that a problem happened.  As far as my code is concerned, 
everything worked, so it updates my position tracking and those 
documents will never be inserted.  I have not yet delved into the 
response object to see whether it can tell me anything.  My code 
currently assumes that if no exception was thrown, it was successful.  
This works with CHSS.  I will write some test code that tries out 
various error situations and see what the response contains.


I've written some test code.  When doing an add with SUSS against a 
server that's down, no exception is thrown.  It does throw one for query 
and deleteByQuery.  When doing the add test with CHSS, an exception is 
thrown.  I guess I'll just have to use CHSS until this gets fixed, 
assuming it ever does.  Would it be at all helpful to file an issue in 
jira, or has one already been filed?  With a quick search, I could not 
find one.


Thanks,
Shawn



Re: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread tom

so any one has a clue what's (might be) going wrong ?

or do i have to debug and myself and post a jira issue?

PS: unfortunately i cant give anyone the index for testing due to NDA.

cheers

On 22.03.2012 10:17, tom wrote:

same

On 22.03.2012 10:00, Markus Jelsma wrote:

Can you try spellcheck.q ?


On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote:

hi folks,

i think i found a bug in the spellchecker but am not quite sure:
this is the query i send to solr:

http://lh:8983/solr/CompleteIndex/select?
rows=0
echoParams=all
spellcheck=true
spellcheck.onlyMorePopular=true
spellcheck.extendedResults=no
q=a+bb+ccc++

and this is the result:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=echoParamsall/str
str name=spellchecktrue/str
str name=echoParamsall/str
str name=spellcheck.extendedResultsno/str
str name=qa bb ccc /str
str name=rows0/str
str name=spellcheck.onlyMorePopulartrue/str
/lst
/lst
result name=response numFound=43 start=0 /
lst name=spellcheck
lst name=suggestions
lst name=bb
int name=numFound1/int
int name=startOffset2/int
int name=endOffset4/int
arr name=suggestion
strabb/str
/arr
/lst
lst name=1
int name=numFound1/int
int name=startOffset5/int
int name=endOffset8/int
arr name=suggestion
strccc/str
/arr
/lst
lst name=2
int name=numFound1/int
int name=startOffset5/int
int name=endOffset8/int
arr name=suggestion
strccc/str
/arr
/lst
lst name=
int name=numFound1/int
int name=startOffset10/int
int name=endOffset14/int
arr name=suggestion
strdvd/str
/arr
/lst
/lst
/lst
/response

now, i know  this is just a technical query and i have done it for a
test regarding suggestions and i discovered the oddity just by chance
and was not regarding the test i did:
my question is regarding, how the suggestions 1 and 2 come
about. from what i understand from the wiki, that the entries in
spellcheck/suggestions are only (misspelled) substrings from the user
query.

the setup/context is thus:
- the words a ccc exists 11 times in the index but 1 and 2 dont


http://lh:8983/solr/CompleteIndex/terms?terms=onterms.fl=spellterms.prefix=cccterms.mincount=0 




responselst name=responseHeaderint name=status0/intint
name=QTime1/int/lstlst name=termslst name=spellint
name=ccc11/int/lst/lst/response
-  analyzer for the spellchecker yields the terms as entered, i.e.
a|bb|ccc|
-  the config is thus

searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetextSpell/str

lst name=spellchecker
str name=namedefault/str
str name=fieldspell/str
str name=spellcheckIndexDir./spellchecker/str
/lst
/searchComponent


does anyone have a clue what's going on?









Re: Using the ids parameter

2012-03-27 Thread Dmitry Kan
So I solved it by using key:(id1 OR ... idn).

On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan dmitry@gmail.com wrote:

 Hi,

 Actually we ran into the same issue with using ids parameter, in the solr
 front with shards architecture (exception throws in the solr front). Were
 you able to solve it by using the key:value syntax or some other way?

 BTW, there was a related issue:
 https://issues.apache.org/jira/browse/SOLR-1477
 but it's marked as Won't Fix, does anyone know why it is so, or if this is
 planned to be resolved?

 Dmitry


 On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote:

 We're running into an issue where we are trying to use the ids=
 parameter to return a set of documents given their id.  This seems to
 work intermittently when running in SolrCloud.  The first question I
 have is this something that we should be using or instead should we
 doing a query with key:?  The stack trace that I am getting right now
 is included below, any thoughts would be appreciated.

 Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
 INFO: [slice1_shard1] webapp=/solr path=/select

 params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
 status=500 QTime=32
 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.NullPointerException
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
  at
 org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
  at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
  at
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
  at
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
  at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
  at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
  at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)






-- 
Regards,

Dmitry Kan


how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Hi,

I am using DIH to index local file system. But the file path, size and 
lastmodified field were not stored. in the schema.xml I defined:

 fields
   field name=title type=string indexed=true stored=true/
   field name=author type=string indexed=true stored=true /
   !--field name=text type=text indexed=true stored=true /
liang added--
   field name=path type=string indexed=true stored=true /
   field name=size type=long indexed=true stored=true /
   field name=lastmodified type=date indexed=true stored=true /
 /fields


And also defined tika-data-config.xml:

dataConfig
dataSource name=bin type=BinFileDataSource /
document
entity name=f dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=E:/my_project/ecmkit/infotouch
fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt) onError=skip
recursive=true
entity name=tika-test dataSource=bin 
processor=TikaEntityProcessor
url=${f.fileAbsolutePath} format=text onError=skip
field column=Author name=author meta=true/
field column=title name=title meta=true/
!--
field column=text name=text/ --
field column=fileAbsolutePath name=path /
field column=fileSize name=size /
field column=fileLastModified name=lastmodified /
/entity
/entity
/document
/dataConfig


The Solr version is 3.5. any idea?

Thanks in advance.



Liang


Re: Client-side failover with SolrJ

2012-03-27 Thread darul
I rediscover the world every day, thanks for this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Client-side-failover-with-SolrJ-tp3858461p3860700.html
Sent from the Solr - User mailing list archive at Nabble.com.


CLOSE_WAIT connections

2012-03-27 Thread Bernd Fehling

Hi list,

I have looked into the CLOSE_WAIT problem and created an issue with a patch to 
fix this.
A search for CLOSE_WAIT shows that there are many Apache projects hit by this 
problem.

https://issues.apache.org/jira/browse/SOLR-3280

Can someone recheck the patch (it belongs to SnapPuller) and give the OK for 
release?
The patch is against branch_3x (3.6).


Regards
Bernd


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mark Miller
Like I said, you have to extend the class and override the error method. 

Sent from my iPhone

On Mar 27, 2012, at 2:29 AM, Shawn Heisey s...@elyograg.org wrote:

 On 3/26/2012 10:25 PM, Shawn Heisey wrote:
 The problem is that I currently have no way (that I know of so far) to 
 detect that a problem happened.  As far as my code is concerned, everything 
 worked, so it updates my position tracking and those documents will never be 
 inserted.  I have not yet delved into the response object to see whether it 
 can tell me anything.  My code currently assumes that if no exception was 
 thrown, it was successful.  This works with CHSS.  I will write some test 
 code that tries out various error situations and see what the response 
 contains.
 
 I've written some test code.  When doing an add with SUSS against a server 
 that's down, no exception is thrown.  It does throw one for query and 
 deleteByQuery.  When doing the add test with CHSS, an exception is thrown.  I 
 guess I'll just have to use CHSS until this gets fixed, assuming it ever 
 does.  Would it be at all helpful to file an issue in jira, or has one 
 already been filed?  With a quick search, I could not find one.
 
 Thanks,
 Shawn
 


Re: document inside document?

2012-03-27 Thread Erick Erickson
For your tagging, think about using multiValued=true with
an increment gap of, say, 100. Then your searches
on this field can be phrase queries with a smaller slop
e.g. tall woman~90 would match, but purse gucci~90
would not because purse and gucci are not within 90
tokens of each other.

As far as the metadata is concerned, this is just specifying
which fields should be queried, see the qf parameter
in edismax.

As far as fieldType, spend some time with admin/analysis to understand
the kinds that various tokenizers and filters do, your question is really
too broad to answer. I'd start with one of the text types and iterate.

Grouping on primary key is a pretty useless thing to do, what is your
use case?

And you'll just have to get used to denormalizing data with Solr/Lucene,
which is hard for a DB person, it just feels icky G..

Best
Erick

On Mon, Mar 26, 2012 at 3:00 PM, sam ” skyn...@gmail.com wrote:
 Hey,

 I am making an image search engine where people can tag images with various
 items that are themselves tagged.
 For example, http://example.com/abc.jpg is tagged with the following three
 items:
 - item1 that is tagged with: tall blond woman
 - item2 that is tagged with: yellow purse
 - item3 that is tagged with: gucci red dress

 Querying for +yellow +purse  will return the example image. But, querying
 for +gucci +purse will not because the image does not have an item tagged
 with both gucci and purse.

 In addition to items, each image has various metadata such as alt text,
 location, description, photo credit.. etc  that should be available for
 search.

 How should I write my schema.xml ?
 If imageUrl is primary key, do I implement my own fieldType for items, so
 that I can write:
 field name=items type=myItemType multiValued=true/
 What would myItemType look like so that solr would know the example image
 will not be part of the query, +gucci +purse??

 If itemId is primary key, I can use result grouping (
 http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt
 text and other image metadata for each item.

 Or, should I create different schema for item search and metadata search?

 Thanks.
 Sam.


Re: Solr cores issue

2012-03-27 Thread Erick Erickson
It might be administratively easier to have multiple webapps, but
it shouldn't really matter as far as I know...

Best
Erick

On Tue, Mar 27, 2012 at 12:22 AM, Sujatha Arun suja.a...@gmail.com wrote:
 yes ,I must have mis-copied and yes, i do have the conf folder per core
 with schema etc ...

 Because of this issue ,we have decided to have multiple webapps with about
 50 cores per webapp  ,instead of one singe webapp with all 200 cores ,would
 this make better sense ?

 what would be your suggestion?

 Regards
 Sujatha

 On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Shouldn't be. What do your log files say? You have to treat each
 core as a separate index. In other words, you need to have a core#/conf
 with the schema matching your core#/data/index directory etc.

 I suspect you've simply mis-copied something.

 Best
 Erick

 On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun suja.a...@gmail.com wrote:
  I was migrating to cores from webapp ,and I was copying a bunch of
 indexes
  from webapps to respective cores ,when I restarted ,I had this issue
 where
  the whole webapp with the cores would not startup and was getting index
  corrupted message..
 
  In this scenario or in a scenario where there is an issue with schema
  /config file for one core ,will the whole webapp with the cores not
 restart?
 
  Regards
  Sujatha
 
  On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Index corruption is very rare, can you provide more details how you
  got into that state?
 
  Best
  Erick
 
  On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun suja.a...@gmail.com
 wrote:
   Hello,
  
   Suppose  I have several cores in a single webapp ,I have issue with
 Index
   beong corrupted in one core  ,or schema /solrconfig of one core is not
  well
   formed ,then entire webapp refused to load on server restart?
  
   Why does this happen?
  
   Regards
   Sujatha
 



preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml

lst name=spellchecker
  str name=namedefault/str
  str name=fielditemDescSpell/str
  str name=buildOnOptimizetrue/str
  str name=spellcheckIndexDirspellchecker_mark/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread janne mattila
It seems that delta import works in 2 steps, first query fetches the
ids of the modified entries, then second query fetches the actual
data.

entity name=item pk=ID
query=select * from item
deltaImportQuery=select * from item where
ID='${dataimporter.delta.id}'
deltaQuery=select id from item where last_modified
gt; '${dataimporter.last_index_time}'
entity name=feature pk=ITEM_ID
query=select description as features from feature
where item_id='${item.ID}'
/entity
entity name=item_category pk=ITEM_ID, CATEGORY_ID
query=select CATEGORY_ID from item_category where
ITEM_ID='${item.ID}'
entity name=category pk=ID
   query=select description as cat from category
where id = '${item_category.CATEGORY_ID}'
/entity
/entity

I am aware that there's a workaround:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

But still, to clarify, and make sure I have up-to-date info how Solr works:

1. Is it possible to fetch the modified data with a single SQL query
using deltaImportQuery, as in:

deltaImportQuery=select * from item where last_modified gt;
'${dataimporter.last_index_time}'?

2. If not - what's the reason delta import is implemented like it is?
Why split it in two queries? I would think having a single delta query
that fetches the data would be kind of an obvious design unless
there's something that calls for 2 separate queries...?


Re: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread Ahmet Arslan

 I am using DIH to index local file system. But the file
 path, size and lastmodified field were not stored. in the
 schema.xml I defined:
 
  fields
    field name=title type=string
 indexed=true stored=true/
    field name=author type=string
 indexed=true stored=true /
    !--field name=text type=text
 indexed=true stored=true /
     liang added--
    field name=path type=string
 indexed=true stored=true /
    field name=size type=long
 indexed=true stored=true /
    field name=lastmodified type=date
 indexed=true stored=true /
  /fields
 
 
 And also defined tika-data-config.xml:
 
 dataConfig
     dataSource name=bin
 type=BinFileDataSource /
     document
         entity name=f
 dataSource=null rootEntity=false
            
 processor=FileListEntityProcessor
            
 baseDir=E:/my_project/ecmkit/infotouch
            
 fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)
 onError=skip
            
 recursive=true
             entity
 name=tika-test dataSource=bin
 processor=TikaEntityProcessor
            
 url=${f.fileAbsolutePath} format=text
 onError=skip
                
 field column=Author name=author meta=true/
                
 field column=title name=title meta=true/
                
 !--
                
 field column=text name=text/ --
                
 field column=fileAbsolutePath name=path /
                
 field column=fileSize name=size /
                
 field column=fileLastModified name=lastmodified
 /
             /entity
         /entity
     /document
 /dataConfig
 
 
 The Solr version is 3.5. any idea?

The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified 
are generated by the FileListEntityProcessor. They should be defined above the 
TikaEntityProcessor.


Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread Ahmet Arslan
 2. If not - what's the reason delta import is implemented
 like it is?
 Why split it in two queries? I would think having a single
 delta query
 that fetches the data would be kind of an obvious design
 unless
 there's something that calls for 2 separate queries...?

I think this is it? https://issues.apache.org/jira/browse/SOLR-811


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Shawn Heisey

On 3/26/2012 6:43 PM, Mark Miller wrote:

It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.


It looks like StreamingUpdateSolrServer is not meant for situations 
where strict error checking is required.  I think the documentation 
should reflect that.  Would you be opposed to a javadoc update at the 
class level (plus a wiki addition) like the following? Because document 
inserts are handled as background tasks, exceptions and errors that 
occur during those operations will not be available to the calling 
program, but they will be logged.  For example, if the Solr server is 
down, your program must determine this on its own.  If you need strict 
error handling, use CommonsHttpSolrServer.  If my wording is bad, feel 
free to make suggestions.


If I'm wrong and you do have an example of an error handling override 
that would do what I need, I would love to see it.  From what I can 
tell, add requests are pushed down and handled by Runner threads, 
completely disconnected from the request.  The response to add calls 
always seems to be a NOTE element saying the request is processed in a 
background stream, even if successful.


Thanks,
Shawn



RE: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread Dyer, James
It might be easier to know what's going on if you provide some snippets from 
solrconfig.xml and schema.xml.  But my guess is that in your solrconfig.xml, 
under the spellcheck searchComponent either the queryAnalyzerFieldType or 
the fieldType (one level down) is set to a field that is removing numbers or 
otherwise modifying the tokens on analysis.  The reason is that your query 
contained ccc but it says that 1 is a misspelled word in your query.  
Typically you want a simple analysis chain that just tokenizes on whitespace 
and little else for spellchecking.

With that said, I wouldn't be surprised if this was a bug as we've had problems 
in the past with words containing numbers, dashes and the like.  If you become 
convinced you've found a bug, would you be able to write a failing unit test 
and post it on JIRA?  See http://wiki.apache.org/solr/HowToContribute for more 
information.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: tom [mailto:dev.tom.men...@gmx.net] 
Sent: Tuesday, March 27, 2012 2:31 AM
To: solr-user@lucene.apache.org
Subject: Re: possible spellcheck bug in 3.5 causing erroneous suggestions

so any one has a clue what's (might be) going wrong ?

or do i have to debug and myself and post a jira issue?

PS: unfortunately i cant give anyone the index for testing due to NDA.

cheers

On 22.03.2012 10:17, tom wrote:
 same

 On 22.03.2012 10:00, Markus Jelsma wrote:
 Can you try spellcheck.q ?


 On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote:
 hi folks,

 i think i found a bug in the spellchecker but am not quite sure:
 this is the query i send to solr:

 http://lh:8983/solr/CompleteIndex/select?
 rows=0
 echoParams=all
 spellcheck=true
 spellcheck.onlyMorePopular=true
 spellcheck.extendedResults=no
 q=a+bb+ccc++

 and this is the result:

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime4/int
 lst name=params
 str name=echoParamsall/str
 str name=spellchecktrue/str
 str name=echoParamsall/str
 str name=spellcheck.extendedResultsno/str
 str name=qa bb ccc /str
 str name=rows0/str
 str name=spellcheck.onlyMorePopulartrue/str
 /lst
 /lst
 result name=response numFound=43 start=0 /
 lst name=spellcheck
 lst name=suggestions
 lst name=bb
 int name=numFound1/int
 int name=startOffset2/int
 int name=endOffset4/int
 arr name=suggestion
 strabb/str
 /arr
 /lst
 lst name=1
 int name=numFound1/int
 int name=startOffset5/int
 int name=endOffset8/int
 arr name=suggestion
 strccc/str
 /arr
 /lst
 lst name=2
 int name=numFound1/int
 int name=startOffset5/int
 int name=endOffset8/int
 arr name=suggestion
 strccc/str
 /arr
 /lst
 lst name=
 int name=numFound1/int
 int name=startOffset10/int
 int name=endOffset14/int
 arr name=suggestion
 strdvd/str
 /arr
 /lst
 /lst
 /lst
 /response

 now, i know  this is just a technical query and i have done it for a
 test regarding suggestions and i discovered the oddity just by chance
 and was not regarding the test i did:
 my question is regarding, how the suggestions 1 and 2 come
 about. from what i understand from the wiki, that the entries in
 spellcheck/suggestions are only (misspelled) substrings from the user
 query.

 the setup/context is thus:
 - the words a ccc exists 11 times in the index but 1 and 2 dont


 http://lh:8983/solr/CompleteIndex/terms?terms=onterms.fl=spellterms.prefix=cccterms.mincount=0
  



 responselst name=responseHeaderint name=status0/intint
 name=QTime1/int/lstlst name=termslst name=spellint
 name=ccc11/int/lst/lst/response
 -  analyzer for the spellchecker yields the terms as entered, i.e.
 a|bb|ccc|
 -  the config is thus

 searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetextSpell/str

 lst name=spellchecker
 str name=namedefault/str
 str name=fieldspell/str
 str name=spellcheckIndexDir./spellchecker/str
 /lst
 /searchComponent


 does anyone have a clue what's going on?






RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
If the list of words isn't very long, you can add a StopFilter to the analysis 
for itemDescSpell and put the words you don't want in the stop list.  If you 
want to prevent low-occuring words from being sued as corrections, use the 
thresholdTokenFrequency in your spellcheck configuration.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, March 27, 2012 9:07 AM
To: solr-user@lucene.apache.org
Subject: preventing words from being indexed in spellcheck dictionary?

hello all,

i am creating a spellcheck dictionary from the itemDescSpell field in my
schema.

is there a way to prevent certain words from entering the dictionary - as
the dictionary is being built?

thanks for any help
mark

// snipped from solarconfig.xml

lst name=spellchecker
  str name=namedefault/str
  str name=fielditemDescSpell/str
  str name=buildOnOptimizetrue/str
  str name=spellcheckIndexDirspellchecker_mark/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mark Miller

On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:

 On 3/26/2012 6:43 PM, Mark Miller wrote:
 It doesn't get thrown because that logic needs to continue - you don't 
 necessarily want one bad document to stop all the following documents from 
 being added. So the exception is sent to that method with the idea that you 
 can override and do what you would like. I've written sample code around 
 stopping and throwing an exception, but I guess its not totally trivial. 
 Other ideas for reporting errors have been thrown around in the past, but no 
 work on it has gotten any traction.
 
 It looks like StreamingUpdateSolrServer is not meant for situations where 
 strict error checking is required.  I think the documentation should reflect 
 that.  Would you be opposed to a javadoc update at the class level (plus a 
 wiki addition) like the following? Because document inserts are handled as 
 background tasks, exceptions and errors that occur during those operations 
 will not be available to the calling program, but they will be logged.  For 
 example, if the Solr server is down, your program must determine this on its 
 own.  If you need strict error handling, use CommonsHttpSolrServer.  If my 
 wording is bad, feel free to make suggestions.
 
 If I'm wrong and you do have an example of an error handling override that 
 would do what I need, I would love to see it.  From what I can tell, add 
 requests are pushed down and handled by Runner threads, completely 
 disconnected from the request.  The response to add calls always seems to be 
 a NOTE element saying the request is processed in a background stream, even 
 if successful.
 
 Thanks,
 Shawn
 


I'm not saying what it's meant for, I'm just saying what it is. Currently, the 
only thing you can do to check for errors is override that method. I understand 
it's still somewhat limiting - it depends on your use case how well it can 
work. For example, I've know people that just want to stop the update process 
if a doc fails, and throw an exception. You can write code to do that by 
extending the class and overriding handleError. You can also collection the 
exceptions, count the fails, read and parse any error messages, etc. It doesn't 
help you with an ID or anything though - unless you get unluck/lucky and can 
parse it out of error messages (if it's even in them). It might be more useful 
if you could set the name of an id field for it to look for and perhaps also 
dump to that method.

Their have been previous conversations about improving error reporting for this 
SolrServer, but no work has ever really gotten off the ground. There may be 
existing JIRA issues around this topic - certainly there are previous email 
threads.

All and all though, please, make all the suggestions and JIRA issues you want. 
Javadoc improvements can be submitted as patches through JIRA as well. Also, 
the Wiki is open to anyone to update. 

- Mark Miller
lucidimagination.com













Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Erick Erickson
https://issues.apache.org/jira/browse/SOLR-445

This JIRA reflects the slightly different case of wanting better
reporting of *which* document failed in a multi-document packet, it
doesn't specifically address SUSS. But it might serve to give you some
ideas if you tackle this.

On Tue, Mar 27, 2012 at 11:14 AM, Mark Miller markrmil...@gmail.com wrote:

 On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:

 On 3/26/2012 6:43 PM, Mark Miller wrote:
 It doesn't get thrown because that logic needs to continue - you don't 
 necessarily want one bad document to stop all the following documents from 
 being added. So the exception is sent to that method with the idea that you 
 can override and do what you would like. I've written sample code around 
 stopping and throwing an exception, but I guess its not totally trivial. 
 Other ideas for reporting errors have been thrown around in the past, but 
 no work on it has gotten any traction.

 It looks like StreamingUpdateSolrServer is not meant for situations where 
 strict error checking is required.  I think the documentation should reflect 
 that.  Would you be opposed to a javadoc update at the class level (plus a 
 wiki addition) like the following? Because document inserts are handled as 
 background tasks, exceptions and errors that occur during those operations 
 will not be available to the calling program, but they will be logged.  For 
 example, if the Solr server is down, your program must determine this on its 
 own.  If you need strict error handling, use CommonsHttpSolrServer.  If my 
 wording is bad, feel free to make suggestions.

 If I'm wrong and you do have an example of an error handling override that 
 would do what I need, I would love to see it.  From what I can tell, add 
 requests are pushed down and handled by Runner threads, completely 
 disconnected from the request.  The response to add calls always seems to be 
 a NOTE element saying the request is processed in a background stream, 
 even if successful.

 Thanks,
 Shawn



 I'm not saying what it's meant for, I'm just saying what it is. Currently, 
 the only thing you can do to check for errors is override that method. I 
 understand it's still somewhat limiting - it depends on your use case how 
 well it can work. For example, I've know people that just want to stop the 
 update process if a doc fails, and throw an exception. You can write code to 
 do that by extending the class and overriding handleError. You can also 
 collection the exceptions, count the fails, read and parse any error 
 messages, etc. It doesn't help you with an ID or anything though - unless you 
 get unluck/lucky and can parse it out of error messages (if it's even in 
 them). It might be more useful if you could set the name of an id field for 
 it to look for and perhaps also dump to that method.

 Their have been previous conversations about improving error reporting for 
 this SolrServer, but no work has ever really gotten off the ground. There may 
 be existing JIRA issues around this topic - certainly there are previous 
 email threads.

 All and all though, please, make all the suggestions and JIRA issues you 
 want. Javadoc improvements can be submitted as patches through JIRA as well. 
 Also, the Wiki is open to anyone to update.

 - Mark Miller
 lucidimagination.com













RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
thank you very much for the info ;)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud with Tomcat and external Zookeeper, does it work?

2012-03-27 Thread jerry.min...@gmail.com
Hi Vadim,

I too am experimenting with SolrCloud and need help with setting it up
using Tomcat as the java servlet container.
While searching for help on this question, I found another thread in
the solr-mailing-list that is helpful.
In case you haven't seen this thread that I found, please search the
solr-mailing-list for: SolrCloud new
You can also view it at nabble using this link:
http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html

Best,
Jerry M.




On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann
v.kisselm...@googlemail.com wrote:

 Hello folks,

 i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First
 Exploration of SolrCloud.
 Examples and a first setup with embedded Jetty and ZK WORKS without problems.

 I tried to setup my own configuration with Tomcat and an external
 Zookeeper(my Master-ZK), but it doesn't work really.

 My setup:
 - latest Solr version from trunk
 - Tomcat 6
 - external ZK
 - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with
 different config/schema

 What i tried:
 --
 1. After checkout i build solr(ant run-example), it works.
 ---
 2. I send my config/schema files to external ZK with Jetty:
 java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/
 -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar
 it works, too.
 ---
 3. I create my (empty, without cores)solr.xml, like Bruno:
 http://www.ngdata.com/site/blog/57-ng.html#disqus_thread
 ---
 4. I started my Tomcat, and get the first error:
 in UI: This interface requires that you activate the admin request
 handlers, add the following configuration to your solrconfig.xml:
 !-- Admin Handlers - This will register all the standard admin
 RequestHandlers. --
 requestHandler name=/admin/ class=solr.admin.AdminHandlers /
 Admin request Handlers are definitely activated in my solrconfig.

 I get this error only with the latest trunk versions, with r1292064
 from February not. Sometimes it works with the new version, sometimes
 not and i get this error.

 --
 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for
 Tomcat and added this: -DzkHost=master-zk:2181
 Next Error:
 This The web application [/solr2] appears to have started a thread
 named [main-SendThread(master-zk:2181)] but has failed to stop it.
 This is very likely to create a memory leak.
 Exception in thread Thread-2 java.lang.NullPointerException
 at 
 org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179)
 at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104)
 at java.lang.Thread.run(Thread.java:662)
 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass
 INFO: Illegal access: this web application instance has been stopped
 already. Could not load org.apache.zookeeper.server.ZooTrace. The
 eventual following stack trace is caused by an error thrown for
 debugging purposes as well as to attempt to terminate the thread which
 caused the illegal access, and has no functional impact.
 java.lang.IllegalStateException
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196)
 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy

 -
 6. Ok, we assume, that the first steps works, and i would create new
 cores and my 2 collections. My requests with CoreAdminHandler are ok,
 my solr.xml looks like this:
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true
  cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080
 hostContext=solr
    core
       name=shard1_data
       collection=col1
       shard=shard1
       instanceDir=xxx/ /
  core
       name=shard2_data
       collection=col2
       shard=shard2
       instanceDir=xx2/ /
  /cores
 /solr

 Now i get the following exception: ...couldn't find conf name for
 collection1...
 I don't have an collection 1. Why this exception?

 ---
 You can see, there are too many exceptions and eventually
 configuration problems with Tomcat and an external ZK.
 Has anyone set up an identical configuration and does it work?
 Does anyone detect mistakes in my configuration steps?

 Best regards
 Vadim


Re: First steps with Solr

2012-03-27 Thread Marcelo Carvalho Fernandes
I've had the same problem and my solution was to...

#set($pName = #field('name'))
#set($pName = $pName.trim())


Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net 
henri.gour...@laposte.net wrote:

 trying to play with javascript to clean-up my URL!!
 Context is velocity



 Suggestions?
 Thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Auto-complete phrase

2012-03-27 Thread Rémy Loubradou
Hello, I am working on creating a auto-complete functionality for my field
merchant_name present all over my documents. I am using the version 3.4 of
Solr and I am trying to take advantage of the Suggester functionality.
Unfortunately so far I didn't figure out how to make it works as  I
expected.

If my list of merchants present in my documents is:(my real list is bigger
than the following list, that's the reason why I don't use dictionnary and
also because it will change often.)
Redoute
Suisse Trois
Conforama
But
Cult Beauty
Brother Trois

I expect from the Suggester component to match words or part of them and
return phrases where words or part of them have been matched.
for example with /suggest?q=tro, I would like to get this:

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=tro
int name=numFound2/int
int name=startOffset0/int
int name=endOffsetx/int
arr name=suggestion
strBother Trois/str
strSuisse Trois/str
/arr
/lst
/lst
/lst
/response

I experimented suggestion on a field configured with the tokenizer
solr.KeywordTokenizerFactory or solr.WhitespaceTokenizerFactory.
In my mind I have to find a way to handle 3 cases:
/suggest?q=bo -(should return) bother trois
/suggest?q=tro -(should return) bother trois, suisse trois
/suggest?q=bo%20tro -(should return) bother trois

With the solr.KeywordTokenizerFactory I get:
/suggest?q=bo - bother trois
/suggest?q=tro - nothing
/suggest?q=bo%20tro - nothing

With the solr.WhitespaceTokenizerFactory I get:
/suggest?q=bo - bother
/suggest?q=troi - trois
/suggest?q=bo%20tro - bother, trois

Not exactly what I want ... :(

My configuration in the file solrconfig.xml for the suggester component:

searchComponent class=solr.SpellCheckComponent name=suggestMerchant
lst name=spellchecker
  str name=namesuggestMerchant/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
  !-- Alternatives to lookupImpl:
   org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state
automaton]
   org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted
finite state automaton]
   org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default,
jaspell-based]
   org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
  --
  str name=fieldmerchant_name_autocomplete/str  !-- the indexed
field to derive suggestions from --
  float name=threshold0.0/float
  str name=buildOnCommittrue/str
!--
  str name=sourceLocationamerican-english/str
--
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest/merchant
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggestMerchant/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
  int name=spellcheck.maxCollations10/int
/lst
arr name=components
  strsuggestMerchant/str
/arr
  /requestHandler

How can I implement autocomplete with the Suggester component to get what I
expect? Thanks for your help, I really appreciate.


Re: Using the ids parameter

2012-03-27 Thread Jamie Johnson
Yes, sorry for the delay, we now do q=key:(key1 key2...) and that
works properly.

On Tue, Mar 27, 2012 at 3:53 AM, Dmitry Kan dmitry@gmail.com wrote:
 So I solved it by using key:(id1 OR ... idn).

 On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan dmitry@gmail.com wrote:

 Hi,

 Actually we ran into the same issue with using ids parameter, in the solr
 front with shards architecture (exception throws in the solr front). Were
 you able to solve it by using the key:value syntax or some other way?

 BTW, there was a related issue:
 https://issues.apache.org/jira/browse/SOLR-1477
 but it's marked as Won't Fix, does anyone know why it is so, or if this is
 planned to be resolved?

 Dmitry


 On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote:

 We're running into an issue where we are trying to use the ids=
 parameter to return a set of documents given their id.  This seems to
 work intermittently when running in SolrCloud.  The first question I
 have is this something that we should be using or instead should we
 doing a query with key:?  The stack trace that I am getting right now
 is included below, any thoughts would be appreciated.

 Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute
 INFO: [slice1_shard1] webapp=/solr path=/select

 params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4}
 status=500 QTime=32
 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.NullPointerException
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)
  at
 org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)
  at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231)
  at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140)
  at
 org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156)
  at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839)
  at
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
  at
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609)
  at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
  at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
  at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)






 --
 Regards,

 Dmitry Kan


Re: First steps with Solr

2012-03-27 Thread Erik Hatcher
Note that the VelocityResponseWriter puts a tool in the context to escape 
various things.  See the Velocity Context section here: 
http://wiki.apache.org/solr/VelocityResponseWriter.  That'll take you to this 
http://velocity.apache.org/tools/releases/1.4/generic/EscapeTool.html

You can do $esc.url($some_variable) to URL encode _pieces_ of a URL.   You can 
see the use of $esc in VM_global_library.vm and some of the other templates 
that ship with Solr.  

Erik


On Mar 27, 2012, at 10:00 , Marcelo Carvalho Fernandes wrote:

 I've had the same problem and my solution was to...
 
 #set($pName = #field('name'))
 #set($pName = $pName.trim())
 
 
 Marcelo Carvalho Fernandes
 +55 21 8272-7970
 +55 21 2205-2786
 
 
 On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net 
 henri.gour...@laposte.net wrote:
 
 trying to play with javascript to clean-up my URL!!
 Context is velocity
 
 
 
 Suggestions?
 Thanks
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
Does anyone know it is a bug or not?
I use Ngram in my index.

fieldType name=text_general_rev class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.NGramTokenizerFactory minGramSize=5
maxGramSize=5/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
fieldType name=text_general_2NGram class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.NGramTokenizerFactory minGramSize=2
maxGramSize=2/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

...

field name=sequence type=text_general_rev indexed=true stored=true
termVectors=true termPositions=true termOffsets=true/

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Ahmet Arslan

Can you reproduce the problem with latest trunk? 


 Does anyone know it is a bug or not?
 I use Ngram in my index.
 
 fieldType name=text_general_rev
 class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory
 minGramSize=5
 maxGramSize=5/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 fieldType name=text_general_2NGram
 class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory
 minGramSize=2
 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 
 ...
 
 field name=sequence type=text_general_rev
 indexed=true stored=true
 termVectors=true termPositions=true
 termOffsets=true/
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread geeky2
hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml



field name=itemDescSpell type=textSpell/


  fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType


--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
Assuming you're just using this field for spellcheck and not for queries, then 
it doesn't matter.  But the correct way to do it is to have it in both places.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, March 27, 2012 3:42 PM
To: solr-user@lucene.apache.org
Subject: RE: preventing words from being indexed in spellcheck dictionary?

hello,

should i apply the StopFilterFactory at index time or query time.

right now - per the schema below - i am applying it at BOTH index time and
query time.

is this correct?

thank you,
mark


// snipped from schema.xml



field name=itemDescSpell type=textSpell/


  fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType


--
View this message in context: 
http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unload(true) doesn't delele Index file when unloading a core

2012-03-27 Thread vybe3142
From what I understand, isn't the index file deletion an expected result?

Thanks


public int drop(, boolean removeIndex) === removeIndex passed
in as true
throws Exception {
String coreName = .
Unload req = new Unload(removeIndex);
req.setCoreName(coreName);
SolrServer adminServer = buildAdminServer();
...
return req.process(adminServer).getStatus();  === removes
reference to solr core in solr.xml but doesn't delete the index file
}

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3862816.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
My current version is solr 3.5. It should be the most updated. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Koji Sekiguchi

How does your sequence field look like in schema.xml, fieldType and field?
And what version are you using?

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

(12/03/27 13:06), neosky wrote:

all of my highlights has one character mistake in the offset,some fragments
from my response. Thanks!

response
lst name=responseHeader
int name=status0/int
int name=QTime259/int
lst name=params
str name=explainOther/
str name=indenton/str
str name=hl.flsequence/str
str name=wt/
str name=hltrue/str
str name=rows10/str
str name=version2.2/str
str name=fl*,score/str
str name=hl.useFastVectorHighlightertrue/str
str name=start0/str
str name=qsequence:NGNFN/str
str name=qt/
str name=fq/
/lst
/lst
lst name=highlighting
lst name= B9SUS0 
arr name=sequence
strTSQSELemSNGNF/emNRRPKIELSNFDGNHPKTWIRKC/str
/arr
/lst
lst name= Q01GW2 
arr name=sequence
strGENTREemRNGNF/emNSLTRERSFAELENHPPKVRRNGSEG/str
/arr
/lst
lst name= C5L0V0 
arr name=sequence
strEGRYPCemNNGNF/emNLTTGRCVCEKNYVHLIYEDRI/str
/arr
/lst
lst name= C4JX93 
arr name=sequence
strYAEENYemINGNF/emNEEPY/str
/arr
/lst
lst name= D7CK80 
arr name=sequence
strKEVADDemCNGNF/emNQPTGVRI/str
/arr
/lst
/lst
/response

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: how to store file path in Solr when using TikaEntityProcessor

2012-03-27 Thread ZHANG Liang F
Could you please show me how to get those values inside TikaEntityProcessor?

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 2012年3月27日 22:43
To: solr-user@lucene.apache.org
Subject: Re: how to store file path in Solr when using TikaEntityProcessor


 I am using DIH to index local file system. But the file path, size and 
 lastmodified field were not stored. in the schema.xml I defined:
 
  fields
    field name=title type=string
 indexed=true stored=true/
    field name=author type=string
 indexed=true stored=true /
    !--field name=text type=text
 indexed=true stored=true /
     liang added--
    field name=path type=string
 indexed=true stored=true /
    field name=size type=long
 indexed=true stored=true /
    field name=lastmodified type=date
 indexed=true stored=true /
  /fields
 
 
 And also defined tika-data-config.xml:
 
 dataConfig
     dataSource name=bin
 type=BinFileDataSource /
     document
         entity name=f
 dataSource=null rootEntity=false
            
 processor=FileListEntityProcessor
            
 baseDir=E:/my_project/ecmkit/infotouch
            
 fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)
 onError=skip
            
 recursive=true
             entity
 name=tika-test dataSource=bin
 processor=TikaEntityProcessor
            
 url=${f.fileAbsolutePath} format=text
 onError=skip
                
 field column=Author name=author meta=true/
                
 field column=title name=title meta=true/
                
 !--
                
 field column=text name=text/ --
                
 field column=fileAbsolutePath name=path /
                
 field column=fileSize name=size /
                
 field column=fileLastModified name=lastmodified
 /
             /entity
         /entity
     /document
 /dataConfig
 
 
 The Solr version is 3.5. any idea?

The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified 
are generated by the FileListEntityProcessor. They should be defined above the 
TikaEntityProcessor.  


Solr with UIMA

2012-03-27 Thread chris3001
I am having a hard time integrating UIMA with Solr. I have downloaded the
Solr 3.5 dist and have it successfully running with nutch and tika on
windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars
from solr/contrib/uima/lib to the working /lib in solr. Next, I read the
readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml
and schema.xml accordingly to no avail. I then found this link which seemed
a bit more applicable since I didnt care to use Alchemy or OpenCalais:
http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still-
when I run a curl command that imports a pdf via solrcell I do not get the
additional UIMA fields nor do I get anything on my logs. The test.pdf is
parsed though and I see the pdf in Solr using:
curl
'http://localhost:8080/solr/update/extract?fmap.content=contentliteral.id=doc1commit=true'
-F file=@test.pdf

What I added to my SolrConfig.XML:

/updateRequestProcessorChain name=uima
  processor
class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
lst name=uimaConfig
  lst name=runtimeParameters
  /lst
  str
name=analysisEngineC:\web\solrcelluimacrawler\com\rondhuit\uima\desc\KeyphraseExtractAnnotatorDescriptor.xml/str
  bool name=ignoreErrorstrue/bool
  str name=logFieldid/str
  lst name=analyzeFields
bool name=mergefalse/bool
arr name=fields
  strcontent/str
/arr
  /lst
  lst name=fieldMappings
lst name=type
  str name=namecom.rondhuit.uima.yahoo.Keyphrase/str
  lst name=mapping
str name=featurekeyphrase/str
str name=fieldUIMAname/str
  /lst
/lst
  /lst
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
/
I also adjusted my requestHander:

/requestHandler name=/update class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.processoruima/str
/lst
  /requestHandler/

Finally, my added entries in my Schema.xml

/
field name=UIMAname type=string indexed=true stored=true
multiValued=true required=false/
dynamicField name=*_sm  type=string  indexed=true  stored=true/
/

All I am trying to do is have test *any* UIMA AE in Solr and cannot figure
out what I am doing wrong. Thank you in advance for reading this.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3863324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StreamingUpdateSolrServer - exceptions not propagated

2012-03-27 Thread Mike Sokolov

On 3/27/2012 11:14 AM, Mark Miller wrote:

On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote:


On 3/26/2012 6:43 PM, Mark Miller wrote:

It doesn't get thrown because that logic needs to continue - you don't 
necessarily want one bad document to stop all the following documents from 
being added. So the exception is sent to that method with the idea that you can 
override and do what you would like. I've written sample code around stopping 
and throwing an exception, but I guess its not totally trivial. Other ideas for 
reporting errors have been thrown around in the past, but no work on it has 
gotten any traction.

It looks like StreamingUpdateSolrServer is not meant for situations where strict error 
checking is required.  I think the documentation should reflect that.  Would you be 
opposed to a javadoc update at the class level (plus a wiki addition) like the following? 
Because document inserts are handled as background tasks, exceptions and errors 
that occur during those operations will not be available to the calling program, but they 
will be logged.  For example, if the Solr server is down, your program must determine 
this on its own.  If you need strict error handling, use CommonsHttpSolrServer.  If 
my wording is bad, feel free to make suggestions.

It might make sense to accumulate the errors in a fixed-size queue and 
report them either when the queue fills up or when the client commits 
(assuming the commit will wait for all outstanding inserts to complete 
or fail).  This is what we do client-side when performing multi-threaded 
inserts.  Sounds great in theory, I think, but then I haven't delved in 
to SUSS at all ... just a suggestion, take it or leave it.  Actually I 
wonder whether SUSS is necessary of you do the threading client-side?  
You might get a similar perf gain; I know we see a substantial speedup 
that way.  because then your updates spawn multiple threads in the 
server anyway, don't they?


- Mike


Re: Auto-complete phrase

2012-03-27 Thread William Bell
I am also very confused at the use case for the Suggester component.
With collate on, it will try to combine random words together not the
actual phrases that are there.

I get better mileage out of EDGE grams and tokenize on whitespace...
Left to right... Since that is how most people think.

However, I would like Suggester to work as follows:

Index:
Chris Smith
Tony Dawson
Chris Leaf
Daddy Golucky

Query:
1. Chris it returns Chris Leaf but not both Chris Smith and Chris Leaf.
2. I seem to get collated (take first work and combine with second
word). SO I would see things like Smith Leaf Very strange and
not what we expect. These are formal names.

When I use Ngrams I can index:

C
Ch
Chr
Chri
Chris
S
Sm
Smi
Smit
Smith

Thus if I search on Smi it will match Chris Smith and also Chris
Leaf. Exactly what I want.




On Tue, Mar 27, 2012 at 11:05 AM, Rémy Loubradou r...@hipsnip.com wrote:
 Hello, I am working on creating a auto-complete functionality for my field
 merchant_name present all over my documents. I am using the version 3.4 of
 Solr and I am trying to take advantage of the Suggester functionality.
 Unfortunately so far I didn't figure out how to make it works as  I
 expected.

 If my list of merchants present in my documents is:(my real list is bigger
 than the following list, that's the reason why I don't use dictionnary and
 also because it will change often.)
 Redoute
 Suisse Trois
 Conforama
 But
 Cult Beauty
 Brother Trois

 I expect from the Suggester component to match words or part of them and
 return phrases where words or part of them have been matched.
 for example with /suggest?q=tro, I would like to get this:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 lst name=spellcheck
 lst name=suggestions
 lst name=tro
 int name=numFound2/int
 int name=startOffset0/int
 int name=endOffsetx/int
 arr name=suggestion
 strBother Trois/str
 strSuisse Trois/str
 /arr
 /lst
 /lst
 /lst
 /response

 I experimented suggestion on a field configured with the tokenizer
 solr.KeywordTokenizerFactory or solr.WhitespaceTokenizerFactory.
 In my mind I have to find a way to handle 3 cases:
 /suggest?q=bo -(should return) bother trois
 /suggest?q=tro -(should return) bother trois, suisse trois
 /suggest?q=bo%20tro -(should return) bother trois

 With the solr.KeywordTokenizerFactory I get:
 /suggest?q=bo - bother trois
 /suggest?q=tro - nothing
 /suggest?q=bo%20tro - nothing

 With the solr.WhitespaceTokenizerFactory I get:
 /suggest?q=bo - bother
 /suggest?q=troi - trois
 /suggest?q=bo%20tro - bother, trois

 Not exactly what I want ... :(

 My configuration in the file solrconfig.xml for the suggester component:

 searchComponent class=solr.SpellCheckComponent name=suggestMerchant
    lst name=spellchecker
      str name=namesuggestMerchant/str
      str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
      str
 name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
      !-- Alternatives to lookupImpl:
           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state
 automaton]
           org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted
 finite state automaton]
           org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default,
 jaspell-based]
           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
      --
      str name=fieldmerchant_name_autocomplete/str  !-- the indexed
 field to derive suggestions from --
      float name=threshold0.0/float
      str name=buildOnCommittrue/str
 !--
      str name=sourceLocationamerican-english/str
 --
    /lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest/merchant
    lst name=defaults
      str name=spellchecktrue/str
      str name=spellcheck.dictionarysuggestMerchant/str
      str name=spellcheck.onlyMorePopulartrue/str
      str name=spellcheck.count10/str
      str name=spellcheck.collatetrue/str
      int name=spellcheck.maxCollations10/int
    /lst
    arr name=components
      strsuggestMerchant/str
    /arr
  /requestHandler

 How can I implement autocomplete with the Suggester component to get what I
 expect? Thanks for your help, I really appreciate.



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: dataImportHandler: delta query fetching data, not just ids?

2012-03-27 Thread janne mattila
How did it work before SOLR-811 update? I don't understand. Did it
fetch delta data with two queries (1. gets ids, 2. gets data per each
id) or did it fetch all delta data with a single query?

On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan iori...@yahoo.com wrote:
 2. If not - what's the reason delta import is implemented
 like it is?
 Why split it in two queries? I would think having a single
 delta query
 that fetches the data would be kind of an obvious design
 unless
 there's something that calls for 2 separate queries...?

 I think this is it? https://issues.apache.org/jira/browse/SOLR-811