Re: Multi-words synonyms matching

2012-05-31 Thread Bernd Fehling

Are you sure with LUCENE_33 (Use of BitVector)?


Am 31.05.2012 17:20, schrieb O. Klein:
> I have been struggling with this as well and found that using LUCENE_33 gives
> the best results.
> 
> But as it will be deprecated this is no everlasting solution. May somebody
> knows one?
> 


Re: How can I remove the home page priority of site home page from search results

2012-05-31 Thread Jack Krupansky
Add &debugQuery=true to your query and check how the home page is scored. 
That should give you a clue why the title is not boosting the score enough. 
Maybe you simply need a higher boost for title, but let the debugQuery 
scoring be your guide.


Actually, if you are explicitly referencing a field in your query 
("title:abc"), that won't pick up the title boost from the "qf" field list. 
You would need an explicit boost in the query itself.


But, I'm not sure I understand how your  query gets expanded: 
q=title:'.$keywords.'


Maybe you wanted: q=title:(.$keywords.), because otherwise spaces between 
the keywords would end the first "fielded term" and then proceed to 
reference the dismax field list (qf).


-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Friday, June 01, 2012 1:46 AM
To: solr-user@lucene.apache.org
Subject: How can I remove the home page priority of site home page from 
search results


My query is like this:

?q=title:'.$keywords.'&defType=edismax&qf=title^10 url^9
content^5&start=0&rows=10&version=2.2&indent=on&hl=true&hl.fl=content&hl.fragsize=300

My results show site home page as the first result even though there are
other pages with title scoring more for the given keywords.

I need to give less priority to site home page than other pages. Please
help.

Thanks
Shameema 



Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky
Your earlier email had this option in your spellcheck.de field type analyzer 
for the StopFilterFactory:


words="german_stop_long.txt"

But your most recent email referred to "stopword.txt".

So, either add "the" to german_stop_long.txt, or change the "words" option 
of your stopfilter to refer to "stopwords.txt".


BTW, I think you can actually have a comma-separated list of stopword files, 
so you can write:


words="german_stop_long.txt,stopwords.txt"

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Friday, June 01, 2012 1:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


spellcheck_de

That should reference a field, not a field type.


Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added "the" to the stopwords.txt
2. added "thex" to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for "the solr"
http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json
6. got the desired result, but also the wrong suggestion "thex"

{ "response" : { "docs" : [ {...  "name" : "Solr, thex Enterprise
Search Server", ..  } ],
 "numFound" : 1,
...  },
...
 "spellcheck" : { "suggestions" : [ "the",
 {..."suggestion" : [ "thex" ]  }
   ] }
}


Here's the complete diff between the original download and my 3 
modifications:


diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
<   Solr, the Enterprise Search Server
---

  Solr, thex Enterprise Search Server

diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785

 
   spellcheck
 


1122a1127

  true

diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16


the 




Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
> spellcheck_de
>
> That should reference a field, not a field type.

Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added "the" to the stopwords.txt
2. added "thex" to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for "the solr"
http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json
6. got the desired result, but also the wrong suggestion "thex"

{ "response" : { "docs" : [ {...  "name" : "Solr, thex Enterprise
Search Server", ..  } ],
  "numFound" : 1,
...  },
...
  "spellcheck" : { "suggestions" : [ "the",
  {..."suggestion" : [ "thex" ]  }
] }
}


Here's the complete diff between the original download and my 3 modifications:

diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
<   Solr, the Enterprise Search Server
---
>   Solr, thex Enterprise Search Server
diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785
>  
>spellcheck
>  
>
1122a1127
>   true
diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16
>
> the


Re: index special characters solr

2012-05-31 Thread Jack Krupansky
Special characters are filtered out of (most) "text" fields, but are 
preserved in "string" fields. String fields might suit your needs, but are 
inconvenient for keyword searching.


You may be able to use the "types" option of the WordDelimiterFilterFactory 
to pass in a custom character type table that has the special characters 
treated as alphabetic characters. Otherwise, you may have to customize the 
code yourself.


-- Jack Krupansky

-Original Message- 
From: KPK

Sent: Thursday, May 31, 2012 7:38 PM
To: solr-user@lucene.apache.org
Subject: index special characters solr

Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , %
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Challenge: Is dynamic data source possible for DataImportHandler JdbcDataSource?

2012-05-31 Thread Cheng Zhang
Hi,

The challenge I'm facing is some sort of dynamic data source. Your valuable 
input is highly appreciated.

Below is my data-config.xml. I have one user database and two company 
databases. The user table in the user database has four columns which are id + 
name + company_dbname + company_id. Depending on the company_dbname, I need to 
look up either companydb0 or companydb1 to get the company name by the 
company_id. 


    

    

    

    
    
    
    
          
        
            
    
    


Is it doable to set the data source dynamically for the child entity? In my 
case, I would like to set company entity dataSource to 
"${USER.company_dbname}"  which is returned from USER entity query.

If it's not doable with current implementation, I would like to download the 
source code and customize it for my needs. Which source java file I should 
start with?

Many many thanks,

Kevin

index special characters solr

2012-05-31 Thread KPK
Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , % 
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada
Hi Jack,

Thanks for your help.

I delete conf/data/* every restart so make sure to work with clean data.

is there any other config I should do?. Maybe another xml file.

Kind regards

On Thu, May 31, 2012 at 5:18 PM, Jack Krupansky wrote:

> It looks okay; renaming a column is fine.
>
> Maybe... maybe when you re-run it DIH is not replacing any documents that
> already have id's in Solr, leaving them with their old field values. Maybe
> you need to manually delete the old Solr documents and run a fresh full
> import.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Rafael Taboada
> Sent: Thursday, May 31, 2012 5:13 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: Data Import Handler fields with different values in column
> and name
>
>
> Please,
>
> Can anyone guide me through this issue? Thanks
>
>
>
> -- Forwarded message --
> From: Rafael Taboada 
> Date: Thu, May 31, 2012 at 12:30 PM
> Subject: Data Import Handler fields with different values in column and
> name
> To: solr-user@lucene.apache.org
>
>
> Hi folks,
>
> I'm using Solr 3.6 and I'm trying to import data from my database to solr
> using Data Import Handler. My db-config is like this:
>
> 
>   url="jdbc:oracle:thin:@**localhost:1521:XE" user="admin" password="admin"
> />
>  
> 
>
>
>
> 
>  
> 
>
> My problem is when I'm trying to use a different values in the field tag,
> for example
>
>
>
> When I use different name from column, this field is omitted. Please can
> you help me with this issue?
>
> My schema.xml is:
>
> 
>  />
>  
>
>  
> 
>  required="true" />
>  />
>  stored="true" />
>  
>
> Thanks in advance!
>
> --
> Rafael Taboada
>
>
>
>
>
>
> --
> Rafael Taboada
>
> /*
> * Phone >> 992 741 026
> */
>



-- 
Rafael Taboada

/*
 * Phone >> 992 741 026
 */


Re: Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Jack Krupansky

It looks okay; renaming a column is fine.

Maybe... maybe when you re-run it DIH is not replacing any documents that 
already have id's in Solr, leaving them with their old field values. Maybe 
you need to manually delete the old Solr documents and run a fresh full 
import.


-- Jack Krupansky

-Original Message- 
From: Rafael Taboada

Sent: Thursday, May 31, 2012 5:13 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Data Import Handler fields with different values in column and 
name


Please,

Can anyone guide me through this issue? Thanks



-- Forwarded message --
From: Rafael Taboada 
Date: Thu, May 31, 2012 at 12:30 PM
Subject: Data Import Handler fields with different values in column and name
To: solr-user@lucene.apache.org


Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:


  
  
 



 
  


My problem is when I'm trying to use a different values in the field tag,
for example



When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:


 
  

  
 
 
 
 
  

Thanks in advance!

--
Rafael Taboada






--
Rafael Taboada

/*
* Phone >> 992 741 026
*/ 



Re: Solr with UIMA

2012-05-31 Thread Jack Krupansky
Is it failing on the first document? I see "uid 5", suggests that it is not. 
If not, how is this document different from the others?


I see the exception
org.apache.uima.resource.ResourceInitializationException, suggesting that 
some file cannot be loaded.


It sounds like it may be having trouble loading "aePath" ("analysisEngine"). 
Or maybe some other file?


-- Jack Krupansky

-Original Message- 
From: debdoot

Sent: Thursday, May 31, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr with UIMA

Hi Tommaso,

I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:


org.apache.solr.common.SolrException: processing error: null. uid=5,
text="Test Room HAW GN-K35..."
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:192)
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:89)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:919)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1016)
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:3703)
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest(WebGroup.java:304)
at
com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:953)
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1655)
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:195)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:452)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:511)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:305)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:276)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1650)
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:86)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:144)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77)
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.(XMLInputSource.java:118)
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:58)
... 32 more

at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)



Please let me know if you have any insights on what could be the issue.

Thanks in advance,
Debdoot


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Strip html

2012-05-31 Thread Chris Hostetter

: I make a transformation XSLT which return :
: ---
: si les ruches d’abeilles prouvent la
:   monarchie, les fourmillières, les troupes d’éléphants ou
: de castors prouvent la république.
: ---
: i put this html in solr:  $doc->addField('body_strip_html', $body_norm);   
...
: But this don't work!
: I want to return this xml files (look exemple) if i search "castor".

I'm confused.

a) you said you've already transformed your input XML into plain text -- 
so i don't see what you need HTML striping at all.
b) your current problem doesn't seem to have anything to do with HTML or 
XML ... you're asking why a document containing "castors" (plural) doesn't 
match a query for "castor" (singular) but the field type you say are using 
has a very simple analyzer that doens't do any stemming of any kind...

>>
>>
>>
>>

..since there is no HTML in your input, HTMLStripCharFilterFactory is a 
no-op.  which leaves StandardTokenizerFactory which just does 
tokenization.

It seems like all you need to do is add a stemmer (and for efficiency: 
remove the HTMLStripCharFilterFactory).  I'm no expert, but it looks like 
you are indexing french, so i would suggest using a french stemmer...

https://wiki.apache.org/solr/LanguageAnalysis#French



-Hoss

Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread jmlucjav
there is at least one scenario where no error is reported when it should be,
if the host runs out of disk when optimizing, it is not reported.

There is a jira issue open I think

--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110p3987144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada
Please,

Can anyone guide me through this issue? Thanks



-- Forwarded message --
From: Rafael Taboada 
Date: Thu, May 31, 2012 at 12:30 PM
Subject: Data Import Handler fields with different values in column and name
To: solr-user@lucene.apache.org


Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:


   
   
  
 
 
 
  
   


My problem is when I'm trying to use a different values in the field tag,
for example

 

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:


  
   

   
  
  
  
  
   

Thanks in advance!

-- 
Rafael Taboada






-- 
Rafael Taboada

/*
 * Phone >> 992 741 026
 */


Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Rahul Warawdekar
Hi,

Thats correct.
For failure, you have to check for the text *"Indexing failed. Rolled back
changes"* under the  tag.
One more thing to note here is that there may be a time during the indexing
process where the indexing is complete but the index is not committed and
optimized yet.
You would need to check if the response listed below is present along with
the success message to term it as a complete success.

*2012-05-31 15:10:45
2012-05-31 15:10:45*

On Thu, May 31, 2012 at 3:42 PM, geeky2  wrote:

> hello all,
>
> i have been asked to write a small polling script (bash) to periodically
> check the status of an import on our Master.  our import times are small,
> but there are business reasons why we want to know the status of an import
> after a specified amount of time.
>
> i need to perform certain actions based on the "status" of the import, and
> therefore need to quantify which tags to check and their appropriate
> states.
>
> i am using the command from the DataImportHandler HTTP API to get the
> status
> of the import:
>
> OUTPUT=$(curl -v
> http://${SERVER}:${PORT}/somecore/dataimport?command=status)
>
>
>
>
> can someone tell me if i have these rules correct?
>
> 1) during an import - the status tag will have a busy state:
>
> example:
>
>  busy
>
> 2) at the completion of an import (regardless of failure or success) the
> status tag will have an "idle" state:
>
> example:
>
>  idle
>
>
> 3) to determine if an import failed or succeeded - you must interrogate the
> tags underand specifically look for :
>
> success:
> Indexing completed. Added/Updated: 603378 documents. Deleted 0
> documents.
>
> failure:
> Indexing completed. Added/Updated: 603378 documents. Deleted 0
> documents.
>
> thank you,
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


RE: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Dyer, James
You've got it right.  Here's a summary:

- "status" = "busy" means its in-process.  
- "status" = "idle" means its finished (success or failure).
- You can drill down further by looking at sub-elements under "statusMessages" :
 > if there is  , it means the last import was cancelled 
 > with "command=abort"
 > look at the body of .  
   o If it begins with "Indexing completed.", then it finished with a success.
   o If it begins with "Indexing failed.", then it finished with a failure.

Just be careful to test your script whenever you change DIH versions.  This 
status screen isn't the best and no doubt it will change sometime in the 
future.  Also, keep in mind that as soon as the next import begins the old 
statuses get lost so you'll need to plan your script runs around that.

Someday it'll be nice if we can come up with a better way than this to 
programitically interact with DIH...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Thursday, May 31, 2012 2:43 PM
To: solr-user@lucene.apache.org
Subject: possible status codes from solr during a (DIH) data import process

hello all,

i have been asked to write a small polling script (bash) to periodically
check the status of an import on our Master.  our import times are small,
but there are business reasons why we want to know the status of an import
after a specified amount of time.

i need to perform certain actions based on the "status" of the import, and
therefore need to quantify which tags to check and their appropriate states.

i am using the command from the DataImportHandler HTTP API to get the status
of the import:

OUTPUT=$(curl -v
http://${SERVER}:${PORT}/somecore/dataimport?command=status)




can someone tell me if i have these rules correct?

1) during an import - the status tag will have a busy state:

example:

  busy

2) at the completion of an import (regardless of failure or success) the
status tag will have an "idle" state:

example:

  idle


3) to determine if an import failed or succeeded - you must interrogate the
tags underand specifically look for :

success: 
Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents.

failure: 
Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents.

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index merge

2012-05-31 Thread sudarshan
Hi All,
   I have a basic doubt about index merging in Solr.  The setup that I
have followed is as follows:

Setup:
I used the schema.xml that comes with the solr example. I had three cores -
core0, core1 and core2.   I tried merging the indexes of core 0 and core 1
to core2.  I copied the same schema.xml from SOLR_HOME/example/solr/conf to
core 0 and core 1 but changed the name field alone as core0 and core1
respectively.
 
Operations:
I indexed different files to core0 and core1. The search *:* in Solr showed
6 files and 9 files for core0 and core1 respectively.  Then merged the
indexes of core0 and core1 to core2. As expected the search *:* showed 15
files for core2. I added 2 new files to the index of core0 and 1 file to
core1 and merged again to core2. This time to my surprise "*" showed the
total number of files showed to be 33 = (15+18) instead of just 18. This
duplication continued for each merge operation which is not efficient. Also
the merged files were available for search only after restarting the Jetty
server. Am I missing something or doing things wrongly? Is there a way to
restart only a specific core to read the new index/reflect the merged
changes? Please explain the merge operation.

Thanks,
Sudarshan   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-merge-tp472904p3987121.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot get highlighting to work

2012-05-31 Thread Jack Krupansky
Try a query that uses a term that doesn't split an alphanumeric term into 
two terms.


Then check to see what field type you used for the symbol and marker_symbol 
fields and whether the analyzer for that field type has changed in 3.6.





-- Jack Krupansky
-Original Message- 
From: Asfand Qazi

Sent: Thursday, May 31, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Cannot get highlighting to work

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it
was working just fine before on our 1.4 instance.

The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml

(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml

(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1&wt=json&hl=true&hl.fl=*&hl.usePhraseHighlighter=true

(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
  MGI:105369: {
symbol: [
  "Cbx1"
],
marker_symbol: [
  "Cbx1"
]
  }
}


I get:
{
  MGI:105369: { }
}


Can anyone help?

Thanks


--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE. 



Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Lance Norskog
Can you add a new stored procedure that uses your current one? It
would operate like the DIH expects.

I don't remember if DB cursors are a standard part of JDBC. If they
are, it would be a great addition to the DIH if they work right.

On Thu, May 31, 2012 at 10:44 AM, Niran Fajemisin  wrote:
> Thanks for your response, Michael. Unfortunately changing the stored 
> procedure is not really an option here.
>
> From what I'm seeing, it would appear that there's really no way of somehow 
> instructing the Data Import Handler to get a handle on the output parameter 
> from the stored procedure. It's a bit surprising though that no one has ran 
> into this scenario but I suppose most people just work around it.
>
> Anyone else care to shed some more light on alternative approaches? Thanks 
> again.
>
>
>
>>
>> From: Michael Della Bitta 
>>To: solr-user@lucene.apache.org
>>Sent: Thursday, May 31, 2012 9:40 AM
>>Subject: Re: Using Data Import Handler to invoke a stored procedure with 
>>output (cursor) parameter
>>
>>I could be wrong about this, but Oracle has a table() function that I
>>believe turns the output of a function as a table. So possibly you
>>could wrap your procedure in a function that returns the cursor, or
>>convert the procedure to a function.
>>
>>Michael Della Bitta
>>
>>
>>Appinions, Inc. -- Where Influence Isn’t a Game.
>>http://www.appinions.com
>>
>>
>>On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin  wrote:
>>> Hi all,
>>>
>>> I've seen a few questions asked around invoking stored procedures from 
>>> within Data Import Handler but none of them seem to indicate what type of 
>>> output parameters were being used.
>>>
>>> I have a stored procedure created in Oracle database that takes a couple 
>>> input parameters and has an output parameter that is a reference cursor. 
>>> The cursor is expected to be used as a way of iterating through the 
>>> returned table rows. I'm using the following format to invoke my stored 
>>> procedure in the Data Import Handler's data config XML:
>>>
>>>  ...
>>>
>>> I have tested that this query works prior to attempting to use it from 
>>> within the DIH. But when I attempt to invoke this stored procedure, it 
>>> naturally complains that the output parameter is not specified (essentially 
>>> a mismatch in the number of parameters).
>>>
>>> I don't know of anyway to pass in a cursor parameter (or any output 
>>> parameter for that matter) to the stored procedure invocation from within 
>>> the  definition.  I would greatly appreciate if anyone could 
>>> provide any pointers or hints on how to proceed.
>>>
>>> Thanks so much for your time
>>>
>>
>>
>>



-- 
Lance Norskog
goks...@gmail.com


Re: Merging Remote Solr Indexes?

2012-05-31 Thread Lance Norskog
Merging indexes is not really useful- it won't make distributed search
any faster. There are features that don't work with distributed
search. Really, you are better off having shards with enough documents
so that relevance scoring is balanced.

On Thu, May 31, 2012 at 11:04 AM, sudarshan
 wrote:
> Hi All,
>       I'm new to Solr. I saw this post relating to Merging of indexes. I
> have a similar doubt. From the post, I understand that merging of indexes
> across different cores is possible only if the cores exist o a single
> machine.     I want to merge indexes of different machines. Can you please
> explain me the different ways of doing this?
>
> Say I have N+1 Solr engines of which there are N different masters and the
> remaining 1 is meant for merging all N indexes together.  How I have decided
> to merge N indexes to 1 is this.
>
> 1. Dynamically edit the solrconfig.xml file of the N+1st system to point as
> a slave to different master each time. Hence a total of N trials would be
> needed to cover all N masters.
> 2. During every trial I shall replicate the index of the master and store it
> in a different folder. Say index1 from master1, index2 from master2 .
> indexn from masterN.
> 3. After all indexes are replicated and moved/renamed to local directory, I
> shall perform a merge of all indexes.
>
>
> What problems will I have in implementing this? How efficient would be this?
> I believe all index folders will have to be available locally to perform
> merging. If not, please tell me how better can I do merge remote indexes.
>
> Another question I have is about MergeFactor. If I set the mergefactor as 5,
> will Solr automatically takes care of merging the segments to 1 if the
> number of segments reach 5? How this can be exploited?
>
> Your assistance is sincerely appreciated.
>
> Regards,
> Sudarshan
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Merging-Remote-Solr-Indexes-tp3434412p3987090.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-31 Thread Walter Underwood
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

The defaults are very good. I have never changed them, and I've had Solr in 
production at two major sites, Netflix and Chegg.

Don't spend any more time worrying about merges.

wunder

On May 31, 2012, at 10:51 AM, sudarshan wrote:

> Walter,
> Thanks again. Can you specify the criteria based on which Solr
> optimizes/force merges segments automatically.  Is this defined by the
> MergeFactor parameter - like if the mergefactor is 10, then merge happens
> for every 10 segments? Please explain. 
> 
> Thanks,
> Sudarshan 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
> Sent from the Solr - User mailing list archive at Nabble.com.







possible status codes from solr during a (DIH) data import process

2012-05-31 Thread geeky2
hello all,

i have been asked to write a small polling script (bash) to periodically
check the status of an import on our Master.  our import times are small,
but there are business reasons why we want to know the status of an import
after a specified amount of time.

i need to perform certain actions based on the "status" of the import, and
therefore need to quantify which tags to check and their appropriate states.

i am using the command from the DataImportHandler HTTP API to get the status
of the import:

OUTPUT=$(curl -v
http://${SERVER}:${PORT}/somecore/dataimport?command=status)




can someone tell me if i have these rules correct?

1) during an import - the status tag will have a busy state:

example:

  busy

2) at the completion of an import (regardless of failure or success) the
status tag will have an "idle" state:

example:

  idle


3) to determine if an import failed or succeeded - you must interrogate the
tags underand specifically look for :

success: 
Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents.

failure: 
Indexing completed. Added/Updated: 603378 documents. Deleted 0
documents.

thank you,


--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky
Spellcheck wants a field, not a field type. You have a spellcheck_de field 
type, but you need a field as well.


spellcheck_de

That should reference a field, not a field type.

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Thursday, May 31, 2012 3:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


is it possible to configure a stopword list to the SpellCheckComponent?



Add a stopwordfilter to your spellcheck field.


Hmm, I did. Could it be another mistake?

This is the schema definition:

   
 
   
   
   
   
   
 
   

This is the solrconfig:

 

  edismax
  10
  text_de title_de^5
  text_de title_de^5

  true
  0



  spellcheck_de

 


 
   textSpell
   
 default
 spellcheck_de
 spellchecker_de
 true
 true
   
  



Fwd: Strip html

2012-05-31 Thread Michael Della Bitta
If I'm not mistaken, that's TEI, and I suggest you consult with the
TEI community for strategies for document indexing, as there are a lot
of branching-style tags in TEI. My guess is that you'll hear that it's
best to perform some sort of term expansion on the document as a
preprocessing step.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com





-Original Message- From: Tigunn
Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html


Hello,
I have an index full text on xml files.
Exemple:
---

                          

si les ruches d’abeilles
>
>                                     prouvent la
>                  monarchie, les fourmillières, les troupes d’éléphants ou
> de 
>                                    
>                                        C
>                                        c
>                                    astors prouvent la
> république.

                              
                          
                      
---
I use solr 1.4.1 to make full text search with php. When i search "castor",
i can't fund this one. But if i search "c astor" it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
                monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc->addField('body_strip_html', $body_norm);

In schema.xml:

      
              
              
      
  

AND

 


But this don't work!
I want to return this xml files (look exemple) if i search "castor".

Can you help me, please?
thanks.


--
View this message in context:
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
>> is it possible to configure a stopword list to the SpellCheckComponent?

> Add a stopwordfilter to your spellcheck field.

Hmm, I did. Could it be another mistake?

This is the schema definition:


  





  


This is the solrconfig:

  
 
   edismax
   10
   text_de title_de^5
   text_de title_de^5

   true
   0
 

 
   spellcheck_de
 
  


  
textSpell

  default
  spellcheck_de
  spellchecker_de
  true
  true

  


Re: Strip html

2012-05-31 Thread Jack Krupansky
There is no option in the Strip HTML filter to discard whitespace between 
elements. And it certainly doesn't know the semantics of some XML schema for 
"choice". You'll have to pre-process that semantics before Solr ingestion, 
or do your own custom filter.


-- Jack Krupansky

-Original Message- 
From: Tigunn

Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html

Hello,
I have an index full text on xml files.
Exemple:
---

   

si les ruches d’abeilles

 prouvent la
  monarchie, les fourmillières, les troupes d’éléphants ou
de 

C
c
astors prouvent la
république.

   
   
   
---
I use solr 1.4.1 to make full text search with php. When i search "castor",
i can't fund this one. But if i search "c astor" it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
 monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc->addField('body_strip_html', $body_norm);

In schema.xml:

   
   
   
   
   

AND

  


But this don't work!
I want to return this xml files (look exemple) if i search "castor".

Can you help me, please?
thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada
Jack,

Thanks for your help.

I restarted solr when I was changing schema.xml anytime.

Any doc about this mentions it is possible to map the column with another
name value. But I can't.

Thanks again.

Rafael

On Thu, May 31, 2012 at 1:27 PM, Jack Krupansky wrote:

> Is there any chance that you added the "anotherasunto" field and then
> forgot to shut down and reload Solr? Any time you edit schema.xml or
> solrconfig.xml you need to reload Solr for the changes to take effect.
>
> -- Jack Krupansky
>
> -Original Message- From: Rafael Taboada
> Sent: Thursday, May 31, 2012 1:30 PM
> To: solr-user@lucene.apache.org
> Subject: Data Import Handler fields with different values in column and
> name
>
>
> Hi folks,
>
> I'm using Solr 3.6 and I'm trying to import data from my database to solr
> using Data Import Handler. My db-config is like this:
>
> 
>   url="jdbc:oracle:thin:@**localhost:1521:XE" user="admin" password="admin"
> />
>  
> 
>
>
>
> 
>  
> 
>
> My problem is when I'm trying to use a different values in the field tag,
> for example
>
>
>
> When I use different name from column, this field is omitted. Please can
> you help me with this issue?
>
> My schema.xml is:
>
> 
>  />
>  
>
>  
> 
>  required="true" />
>  />
>  stored="true" />
>  
>
> Thanks in advance!
>
> --
> Rafael Taboada
>



-- 
Rafael Taboada

/*
 * Phone >> 992 741 026
 */


Re: Data Import Handler fields with different values in column and name

2012-05-31 Thread Jack Krupansky
Is there any chance that you added the "anotherasunto" field and then forgot 
to shut down and reload Solr? Any time you edit schema.xml or solrconfig.xml 
you need to reload Solr for the changes to take effect.


-- Jack Krupansky

-Original Message- 
From: Rafael Taboada

Sent: Thursday, May 31, 2012 1:30 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler fields with different values in column and name

Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:


  
  
 



 
  


My problem is when I'm trying to use a different values in the field tag,
for example



When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:


 
  

  
 
 
 
 
  

Thanks in advance!

--
Rafael Taboada 



Re: Merging Remote Solr Indexes?

2012-05-31 Thread sudarshan
Hi All,
   I'm new to Solr. I saw this post relating to Merging of indexes. I
have a similar doubt. From the post, I understand that merging of indexes
across different cores is possible only if the cores exist o a single
machine. I want to merge indexes of different machines. Can you please
explain me the different ways of doing this?

Say I have N+1 Solr engines of which there are N different masters and the
remaining 1 is meant for merging all N indexes together.  How I have decided
to merge N indexes to 1 is this.

1. Dynamically edit the solrconfig.xml file of the N+1st system to point as
a slave to different master each time. Hence a total of N trials would be
needed to cover all N masters.
2. During every trial I shall replicate the index of the master and store it
in a different folder. Say index1 from master1, index2 from master2 .
indexn from masterN.
3. After all indexes are replicated and moved/renamed to local directory, I
shall perform a merge of all indexes.


What problems will I have in implementing this? How efficient would be this?
I believe all index folders will have to be available locally to perform
merging. If not, please tell me how better can I do merge remote indexes.

Another question I have is about MergeFactor. If I set the mergefactor as 5,
will Solr automatically takes care of merging the segments to 1 if the
number of segments reach 5? How this can be exploited?

Your assistance is sincerely appreciated.

Regards,
Sudarshan

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-Remote-Solr-Indexes-tp3434412p3987090.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-31 Thread sudarshan
Walter,
 Thanks again. Can you specify the criteria based on which Solr
optimizes/force merges segments automatically.  Is this defined by the
MergeFactor parameter - like if the mergefactor is 10, then merge happens
for every 10 segments? Please explain. 

Thanks,
Sudarshan 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Stop Words in SpellCheckComponent

2012-05-31 Thread Markus Jelsma
Add a stopwordfilter to your spellcheck field.
 
-Original message-
> From:Matthias Müller 
> Sent: Thu 31-May-2012 18:39
> To: solr-user@lucene.apache.org
> Subject: Stop Words in SpellCheckComponent
> 
> Hi,
> 
> is it possible to configure a stopword list to the SpellCheckComponent?
> 
> For example:
> When searching for "the indexs" "the" is filtered, because it is a stopword.
> The SpellCheckComponent gives me a false suggestion for "the".
> But the SpellCheckComponent should only give a suggestion for "index"
> because "the" is a stopword.
> 
> Kind Regards
> 
> Matthias
> 


Re: Solr with UIMA

2012-05-31 Thread debdoot
Further observation on the error:

All requests to add documents through the /update URL land up with the same
error, irrespective of the fields contained in the document. If I don't use
the UIMAUpdateRequestProcessor, I can add/update documents successfully.

Here are the snippets relevant to updateRequestProcessor declarations in my
solrconfig.xml



 
   
   
 uima
   
  



  

  
  
  C:\ex1\RoomNumberAnnotator.xml
  false
  
  
false

  content

  
  

  org.apache.uima.tutorial.RoomNumber
  
building
UIMAname
  

  

  
  
  



Please help.

Thanks
Debdoot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987083.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Niran Fajemisin
Thanks for your response, Michael. Unfortunately changing the stored procedure 
is not really an option here. 

From what I'm seeing, it would appear that there's really no way of somehow 
instructing the Data Import Handler to get a handle on the output parameter 
from the stored procedure. It's a bit surprising though that no one has ran 
into this scenario but I suppose most people just work around it.

Anyone else care to shed some more light on alternative approaches? Thanks 
again.



>
> From: Michael Della Bitta 
>To: solr-user@lucene.apache.org 
>Sent: Thursday, May 31, 2012 9:40 AM
>Subject: Re: Using Data Import Handler to invoke a stored procedure with 
>output (cursor) parameter
> 
>I could be wrong about this, but Oracle has a table() function that I
>believe turns the output of a function as a table. So possibly you
>could wrap your procedure in a function that returns the cursor, or
>convert the procedure to a function.
>
>Michael Della Bitta
>
>
>Appinions, Inc. -- Where Influence Isn’t a Game.
>http://www.appinions.com
>
>
>On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin  wrote:
>> Hi all,
>>
>> I've seen a few questions asked around invoking stored procedures from 
>> within Data Import Handler but none of them seem to indicate what type of 
>> output parameters were being used.
>>
>> I have a stored procedure created in Oracle database that takes a couple 
>> input parameters and has an output parameter that is a reference cursor. The 
>> cursor is expected to be used as a way of iterating through the returned 
>> table rows. I'm using the following format to invoke my stored procedure in 
>> the Data Import Handler's data config XML:
>>
>>  ...
>>
>> I have tested that this query works prior to attempting to use it from 
>> within the DIH. But when I attempt to invoke this stored procedure, it 
>> naturally complains that the output parameter is not specified (essentially 
>> a mismatch in the number of parameters).
>>
>> I don't know of anyway to pass in a cursor parameter (or any output 
>> parameter for that matter) to the stored procedure invocation from within 
>> the  definition.  I would greatly appreciate if anyone could provide 
>> any pointers or hints on how to proceed.
>>
>> Thanks so much for your time
>>
>
>
>

Data Import Handler fields with different values in column and name

2012-05-31 Thread Rafael Taboada
Hi folks,

I'm using Solr 3.6 and I'm trying to import data from my database to solr
using Data Import Handler. My db-config is like this:


   
   
  
 
 
 
  
   


My problem is when I'm trying to use a different values in the field tag,
for example

 

When I use different name from column, this field is omitted. Please can
you help me with this issue?

My schema.xml is:


  
   

   
  
  
  
  
   

Thanks in advance!

-- 
Rafael Taboada


Strip html

2012-05-31 Thread Tigunn
Hello,
I have an index full text on xml files. 
Exemple:
---



si les ruches d’abeilles
>  prouvent la
>   monarchie, les fourmillières, les troupes d’éléphants ou
> de 
> 
> C
> c
> astors prouvent la
> république.



---
I use solr 1.4.1 to make full text search with php. When i search "castor",
i can't fund this one. But if i search "c astor" it's ok: problem 

I make a transformation XSLT which return :
---
si les ruches d’abeilles prouvent la
  monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---
i put this html in solr:  $doc->addField('body_strip_html', $body_norm);   

In schema.xml:







AND

   


But this don't work!
I want to return this xml files (look exemple) if i search "castor".

Can you help me, please?
thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr with UIMA

2012-05-31 Thread debdoot
Hi Tommaso,

I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:


org.apache.solr.common.SolrException: processing error: null. uid=5, 
text="Test Room HAW GN-K35..."
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:192)
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:89)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:919)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1016)
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:3703)
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest(WebGroup.java:304)
at
com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:953)
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1655)
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:195)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:452)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:511)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:305)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:276)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
at 
com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1650)
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:86)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:144)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77)
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.(XMLInputSource.java:118)
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:58)
... 32 more

at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)



Please let me know if you have any insights on what could be the issue.

Thanks in advance,
Debdoot


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
Hi,

is it possible to configure a stopword list to the SpellCheckComponent?

For example:
When searching for "the indexs" "the" is filtered, because it is a stopword.
The SpellCheckComponent gives me a false suggestion for "the".
But the SpellCheckComponent should only give a suggestion for "index"
because "the" is a stopword.

Kind Regards

Matthias


Cannot get highlighting to work

2012-05-31 Thread Asfand Qazi

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it 
was working just fine before on our 1.4 instance.


The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml

(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml

(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1&wt=json&hl=true&hl.fl=*&hl.usePhraseHighlighter=true

(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
  MGI:105369: {
symbol: [
  "Cbx1"
],
marker_symbol: [
  "Cbx1"
]
  }
}


I get:
{
  MGI:105369: { }
}


Can anyone help?

Thanks


--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: per-fieldtype similarity not working

2012-05-31 Thread Robert Muir
On Thu, May 31, 2012 at 11:23 AM, Markus Jelsma
 wrote:

> We simply declare the following in our fieldType:
> 
>

Thats not enough, see the example:
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/schema-sim.xml


-- 
lucidimagination.com


per-fieldtype similarity not working

2012-05-31 Thread Markus Jelsma
Hi,

We intend to use different similarity implemenations for some field types 
configured according to SOLR-2338. I doubled checked with the schema in 
test-files and everything seems fine. However, the result is not correct and 
debugQuery shows the default configured similarity implementation is being used.

We simply declare the following in our fieldType:



Thanks,
Markus


Re: Multi-words synonyms matching

2012-05-31 Thread O. Klein
I have been struggling with this as well and found that using LUCENE_33 gives
the best results.

But as it will be deprecated this is no everlasting solution. May somebody
knows one?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987048.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Accent Characters

2012-05-31 Thread Vicente Couto
Hello, guys.

Now it's working. Thank you both Jack and Sami.
I fixed my issue by just using server.query(query, METHOD.POST) in solrJ and
yes, I was using HttpSolrServer. I have to move on to CommonsHttpSolrServer.

Thank you very much.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931p3987046.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Michael Della Bitta
I could be wrong about this, but Oracle has a table() function that I
believe turns the output of a function as a table. So possibly you
could wrap your procedure in a function that returns the cursor, or
convert the procedure to a function.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin  wrote:
> Hi all,
>
> I've seen a few questions asked around invoking stored procedures from within 
> Data Import Handler but none of them seem to indicate what type of output 
> parameters were being used.
>
> I have a stored procedure created in Oracle database that takes a couple 
> input parameters and has an output parameter that is a reference cursor. The 
> cursor is expected to be used as a way of iterating through the returned 
> table rows. I'm using the following format to invoke my stored procedure in 
> the Data Import Handler's data config XML:
>
>  ...
>
> I have tested that this query works prior to attempting to use it from within 
> the DIH. But when I attempt to invoke this stored procedure, it naturally 
> complains that the output parameter is not specified (essentially a mismatch 
> in the number of parameters).
>
> I don't know of anyway to pass in a cursor parameter (or any output parameter 
> for that matter) to the stored procedure invocation from within the  
> definition.  I would greatly appreciate if anyone could provide any pointers 
> or hints on how to proceed.
>
> Thanks so much for your time
>


RE: spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Markus Jelsma
Thanks James, that works nicely!
 
 
-Original message-
> From:Dyer, James 
> Sent: Thu 31-May-2012 16:05
> To: solr-user@lucene.apache.org
> Subject: RE: spellcheck collate with fq parameters SOLR-2010
> 
> Markus,
> 
> When you set "spellcheck.maxCollationTries" to a value greater than zero, the 
> spellchecker will query each collation candidate to determine how many hits 
> it would return.  If the collation will not yield any hits, it throws it away 
> then tries some more (up to whatever value you set).  You can verify the 
> correctness of this by setting "spellcheck.maxCollationTries" to zero (no 
> checking) and then re-trying the collation(s) it suggests by hand (with the 
> same "fq" params, etc).
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
> Sent: Thursday, May 31, 2012 8:45 AM
> To: solr-user@lucene.apache.org
> Subject: spellcheck collate with fq parameters SOLR-2010
> 
> Hi,
> 
> It seems it doesn't work or i cannot get it to work. I've tried both the 
> IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
> correctly spelled flag is correct when considering the fq parameters but the 
> collation is never when using a filter. I've also tried 
> spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
> high) makes the collation element to disappear. Are there any (open) issues 
> that i'm not aware of?
> 
> Thanks,
> Markus
> 


RE: spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Dyer, James
Markus,

When you set "spellcheck.maxCollationTries" to a value greater than zero, the 
spellchecker will query each collation candidate to determine how many hits it 
would return.  If the collation will not yield any hits, it throws it away then 
tries some more (up to whatever value you set).  You can verify the correctness 
of this by setting "spellcheck.maxCollationTries" to zero (no checking) and 
then re-trying the collation(s) it suggests by hand (with the same "fq" params, 
etc).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Thursday, May 31, 2012 8:45 AM
To: solr-user@lucene.apache.org
Subject: spellcheck collate with fq parameters SOLR-2010

Hi,

It seems it doesn't work or i cannot get it to work. I've tried both the 
IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
correctly spelled flag is correct when considering the fq parameters but the 
collation is never when using a filter. I've also tried 
spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
high) makes the collation element to disappear. Are there any (open) issues 
that i'm not aware of?

Thanks,
Markus


Re: XInclude Multiple Elements

2012-05-31 Thread Bogdan Nicolau
I've also tried a lot of tricks to get xpointer working with multiple child
elements, to no success. 
In the end, I've resorted to a less pretty, other-way-around solution. I do
something like this:
solrconfig_common.xml -> no xml declaration, no root tag, no nothing


...
For each file that I need the common stuff into, I'd do something like this:
solrconfig_master.xml/solrconfig_slave.xml/etc.


]>


&solrconfigcommon;



Solr starts with 0 warnings, the configuration is properly loaded, etc.
Property substitution also works, including inside the
solrconfig_common.xml. Hope it helps anyone.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/XInclude-Multiple-Elements-tp3167658p3987029.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hightlighting and excerpt

2012-05-31 Thread Ahmet Arslan
> I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB 
> was stressed?

Hi Tolga, 

I think, you can easily learn the basic using one of the following books.
http://lucene.apache.org/solr/books.html



Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky

The Solr example. As in the Solr tutorial.

See:
http://lucene.apache.org/solr/api/doc-files/tutorial.html

Index books.json from exampledocs and then enter a /browse request in your 
web browser. Add the "&wt=xml" query parameter so that you can see the raw 
XML response that shows the "highlighting" section rather than the 
VelocityWriter output.


Since you said that highlighting was working for you, please post an example 
of the "highlighting" section of a Solr response.


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

You mean http:///www.example.com:8983/solr/browse? It says "unknown
field 'cat'"

On 5/31/12 4:16 PM, Jack Krupansky wrote:
Yes, that is what highlighting does - it extracts an excerpt and 
highlights search terms. You said you have highlighting working, so what 
else is it that you need?


Try "/browse" in the Solr example. It does exactly what your example 
shows. So, what else is it that you are trying to do? Or if something 
isn't working, what specifically isn't working?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in "excerpts" 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,






spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Markus Jelsma
Hi,

It seems it doesn't work or i cannot get it to work. I've tried both the 
IndexSpellchecker in Solr 3.2 and the DirectSpellchecker of trunk. The 
correctly spelled flag is correct when considering the fq parameters but the 
collation is never when using a filter. I've also tried 
spellcheck.maxCollationTries on trunk but any value higher than 0 (even very 
high) makes the collation element to disappear. Are there any (open) issues 
that i'm not aware of?

Thanks,
Markus


Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
You mean http:///www.example.com:8983/solr/browse? It says "unknown 
field 'cat'"


On 5/31/12 4:16 PM, Jack Krupansky wrote:
Yes, that is what highlighting does - it extracts an excerpt and 
highlights search terms. You said you have highlighting working, so 
what else is it that you need?


Try "/browse" in the Solr example. It does exactly what your example 
shows. So, what else is it that you are trying to do? Or if something 
isn't working, what specifically isn't working?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in "excerpts" 
(snippets or fragments from a text field), what else is it that you 
need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about 
excerpt?


Thanks and regards, 




Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky
Yes, that is what highlighting does - it extracts an excerpt and highlights 
search terms. You said you have highlighting working, so what else is it 
that you need?


Try "/browse" in the Solr example. It does exactly what your example shows. 
So, what else is it that you are trying to do? Or if something isn't 
working, what specifically isn't working?


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Hightlighting and excerpt

I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB
was stressed?

On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in "excerpts" 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards, 




Efficiently mining or parsing data out of XML source files

2012-05-31 Thread Van Tassell, Kristian
I'm just wondering what the general consensus is on indexing XML data to Solr 
in terms of parsing and mining the relevant data out of the file and putting 
them into Solr fields. Assume that this is the XML file and resulting Solr 
fields:

XML data:

foo

garbage data


Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, 
however, is not acceptable. 2800 objects take 11 minutes to parse and index 
into Solr.

The big slowdown appears to be that I'm parsing the data with an XML parser.

So, now I'm testing mining the data by opening the file as just a text file 
(using Groovy) and picking out relevant data using regular expression matching. 
I'm now able to parse (mine) the data and index the 2800 files in 72 seconds.

So I'm wondering if the typical solution people use is to go with a non-XML 
solution. It seems to make sense considering the search index would only want 
to store (as much data) as possible and not rely on the incoming documents 
being xml compliant.

Thanks in advance for any thoughts on this!
-Kristian









Re: Hightlighting and excerpt

2012-05-31 Thread Tolga
I need something like http://cl.ly/2o2E0g0S422d2p1X203h . See how TCMB 
was stressed?


On 5/31/12 3:54 PM, Jack Krupansky wrote:
Since highlighting, by definition, does highlight terms in "excerpts" 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- From: Tolga
Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards,


Re: Hightlighting and excerpt

2012-05-31 Thread Jack Krupansky
Since highlighting, by definition, does highlight terms in "excerpts" 
(snippets or fragments from a text field), what else is it that you need?


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Thursday, May 31, 2012 4:55 AM
To: solr-user@lucene.apache.org
Subject: Hightlighting and excerpt

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be
able to highlight my keyword and display en excerpt containing that
keyword. I found a solution with highlight, but what can I about excerpt?

Thanks and regards, 



Re: Query elevation / boosting or something else to guarantee document position

2012-05-31 Thread Michael Kuhlmann

Hi Wenca,

I'm a bit late. but maybe you're still interested.

There's no such functionality in standard Solr. With sorting, this is 
not possible, because sort functions only rank each single document, 
they know nothing about the position of the others. And query elevation 
is similar, you'll raise the score of independent documents.


To achive this, you'll need an own QueryComponent. This isn't too 
complicated. You can't change the SolrIndexSearcher easily, this does 
the search job. But you can subclass 
org.apache.solr.handler.component.QueryComponent and overwrite 
process(). Alas the single main line - searcher.search() - is buried 
deeply in the huge monster method process(), and you first have to check 
for shards, grouping and twentythousand other parameters until you've 
arrived the code line you may want to expand.


Before calling search(), set the GET_DOCSET flag in your QueryCommand 
object, then execute the search. To check whether there's a document of 
the particular manufacturer in the result list, you can either
a) fetch the appropriate field value from the default field cache for 
every single result document until you found one; or
b) call getDocSet() on the SolrIndexSearcher with the manufacturer query 
as the parameter, and perform and and() operation on the resulting 
DocSet with the DocSet of your main query. (That's why you set the flag 
before.) You can then check which document that matches both the 
manufacturer and the main query fits best.


If you found a matching document, but it's behind pos. 5 in the 
resulting DocList, the you simoply have to re-order your list.


If there's no such document within the DocList (which is limited by your 
rows parameter), but there are some in the joined DocSet from strategy 
b), then you can simply choose one of them and ignore the fact that this 
is probably not the best matching one. Or you have to patch Solr and 
modify getDocListNC() in solrIndexSearcher (or one of the Collector 
classes), which is much more complicated.


Good luck!
-Kuli

Am 29.05.2012 14:26, schrieb Wenca:

Hi all,

I have an index with thousands of products with various fields
(manufacturer, price, popularity, type, color, ...) and I want to
guarantee at least one product by a particular manufacturer to be within
the first 5 results.

The search is done mainly by using filter params and results are ordered
by function e.g.: "product(price, popularity) asc" or by "discount desc"

And I need to guarantee that if there is any product matching the given
filters made by a concrete manufacturer, then it will be on the 5th
position at worst, even if the position by the order function is worse.

It seems to me that the Query elevation component is not the right thing
for me. I don't know the query in advance (or the set of filter
criteria) and I don't know concrete product that will be the best for
the criteria within the order.

And also I don't think that I can construct a function with such
requirements to use it directly for ordering the results.

Of course I can make a second query in case there is no desired product
on the first page of results and put it there, but it requires
additional request to solr and complicates results processing and
further pagination.

Can anybody suggest any solution?

Thanks
Wenca




Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-05-31 Thread Niran Fajemisin
Hi all,

I've seen a few questions asked around invoking stored procedures from within 
Data Import Handler but none of them seem to indicate what type of output 
parameters were being used.

I have a stored procedure created in Oracle database that takes a couple input 
parameters and has an output parameter that is a reference cursor. The cursor 
is expected to be used as a way of iterating through the returned table rows. 
I'm using the following format to invoke my stored procedure in the Data Import 
Handler's data config XML:

 ...

I have tested that this query works prior to attempting to use it from within 
the DIH. But when I attempt to invoke this stored procedure, it naturally 
complains that the output parameter is not specified (essentially a mismatch in 
the number of parameters).

I don't know of anyway to pass in a cursor parameter (or any output parameter 
for that matter) to the stored procedure invocation from within the  
definition.  I would greatly appreciate if anyone could provide any pointers or 
hints on how to proceed.

Thanks so much for your time



Re: difference between Katta and SolrCloud (replicator factor)

2012-05-31 Thread Jamel ESSOUSSI
Hi,

responses please

-- Jamel E

--
View this message in context: 
http://lucene.472066.n3.nabble.com/difference-between-Katta-and-SolrCloud-replicator-factor-tp3986791p3986998.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework

2012-05-31 Thread Wunderlich, Tobias
Thanks for all the responses. I went with the UpdateRequestProcessor and it 
works.


-Ursprüngliche Nachricht-
Von: Lance Norskog [mailto:goks...@gmail.com] 
Gesendet: Samstag, 26. Mai 2012 01:53
An: solr-user@lucene.apache.org
Betreff: Re: Creating custom Filter / Tokenizer / Request Handler for 
integration of NER-Framework

Another problem (just discovered this): TokenizerFactories do not get resource 
handlers. So, you can't go read config or model files for your Tokenizer. 
TokenFilters do, so you can use the KeywordTokenizer (make one big term) and do 
your work in a TokenFilter that gets the whole thing.

On Thu, May 24, 2012 at 7:33 AM, Jan Høydahl  wrote:
> As Ahmet says, The Update Chain is probably the place to integrate such 
> document oriented processing.
> See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it 
> integrates with Solr.
>
> --
> Jan Høydahl, search solution architect Cominvent AS - 
> www.facebook.com/Cominvent Solr Training - www.solrtraining.com
>
> On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote:
>
>> Hey Guys,
>>
>> I am recently working on a project to integrate a 
>> Named-Entity-Recognition-Framework (NER) in an existing searchplatform based 
>> on Solr. The Platform uses ManifoldCF to automatically gather the content 
>> from various repositories. The NER-Framework creates Annotations/Metadata 
>> from given content which I then want to integrate into the search-platform 
>> as metadata to use for faceting. Since MCF handles all content gathering, I 
>> need a way to integrate the NER-Framework directly into Solr. The Goal is to 
>> get all Annotations per document into a multivalued field.  My first thought 
>> was to create a custom filter, which just takes the content and gives back 
>> only the Annotations.  But as I understand it, a filter only processes 
>> predetermined Tokens, which is useless for my purpose, since the 
>> NER-Framework needs to process the whole content of a document. What about a 
>> custom Tokenizer? Would it be possible to process the whole text and give 
>> back only the Annotations as Tokens? A third thought was to manipulate the 
>> ExtractRequestHandler (Solr Cell) used by MCF to somehow add the Annotations 
>> as Metadata when the content and metadata is distributed to the different 
>> fields.
>>
>> I hope my problem description is sufficient. Does anybody have any thoughts 
>> on that subject?
>>
>> Best regards,
>> Tobias
>



--
Lance Norskog
goks...@gmail.com


Hightlighting and excerpt

2012-05-31 Thread Tolga

Hi,

Two separate things asked in one thread...

I am crawling my websites with nutch. When I index them, I'd like to be 
able to highlight my keyword and display en excerpt containing that 
keyword. I found a solution with highlight, but what can I about excerpt?


Thanks and regards,


Re: Poll: What do you use for Solr performance monitoring?

2012-05-31 Thread Vadim Kisselmann
Hi Otis,
done :) Till now we use Graphite, Ganglia and Zabbix. For our JVM
monitoring JStatsD.
Best regards
Vadim


2012/5/31 Otis Gospodnetic :
> Hi,
>
> Super quick poll:  What do you use for Solr performance monitoring?
> Vote here: 
> http://blog.sematext.com/2012/05/30/poll-what-do-you-use-for-solr-performance-monitoring/
>
>
> I'm collecting data for my Berlin Buzzwords talk that will touch on Solr, so 
> your votes will be greatly appreciated!
>
> Thanks,
> Otis


Re: how to read fieldValueCacheStatistics

2012-05-31 Thread elisabeth benoit
ok, thanks a lot for the answer.

Elisabeth

2012/5/31 Chris Hostetter 

>
> : When I read fieldValueCache statistics I have something that looks like
> :
> : item_ABC_FACET :
> :
> {field=ABC_FACET,memSize=4224,tindexSize=32,time=92,phase1=92,nTerms=0,bigTerms=0,termInstances=0,uses=11}
> :
> :
> : is there a doc somewhere that explains what are
>
> ...technically that's one stat, showing you and "UnInvertedField"
> instance in the cache (that's the string-ification of that
> UnInvertedField)
>
> the specifics of what those numbers mean are definitely what i would
> consider "expert level" ... off the top of my head the only ones i am
> fairly sure of are:
>
> memSize - how many bytes of ram it's using
> time - how long it took to build
> nTerms - number of unique terms in that field
> bigTerms - number of "big" terms, ie: terms that have such a high docFreq,
> they weren't un-inverted because it would be too ineffectient.
>
> In general, this level of detail is the kind of thing where you should
> probably review the code.
>
>
> -Hoss
>