date:20130530

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?

-Original Message-
From: bbarani [mailto:bbar...@gmail.com] 
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open.. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Problem with xpath expression in data-config.xml

2013-05-30 Thread Hans-Peter Stricker

Thanks for having analyzed the problem. But please let me note that I came to a 
somehow different conclusion.

Define for the moment title to be the primary unique key: 

solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml

uniqueKeytitle/uniqueKey 

solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml

[BAD CASE] (irrespective of the predicate @rel='self')
dataConfig
dataSource type=URLDataSource /
document
entity name=beautybooks88  pk=title 
url=http://beautybooks88.blogspot.com/feeds/posts/default; 
processor=XPathEntityProcessor forEach=/feed/entry 
transformer=DateFormatTransformer
field column=title xpath=/feed/entry/title /
field column=source-link xpath=/feed/link[@rel='self']/@href 
commonField=true /
/entity
/document
/dataConfig

[GOOD CASE]
dataConfig
dataSource type=URLDataSource /
document
entity name=beautybooks88  pk=title 
url=http://beautybooks88.blogspot.com/feeds/posts/default; 
processor=XPathEntityProcessor forEach=/feed/entry 
transformer=DateFormatTransformer
field column=title xpath=/feed/entry/title /
field column=link xpath=/feed/entry/link[@rel='self']/@href /
/entity
/document
/dataConfig

Conclusion: It has nothing to do with the number of occurrences of the pattern.

[DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-30 Thread jerome . dupont


Hello,

I want to use a index a huge list of xml file.
_ Using FileListEntityProcessor causes an OutOfMemoryException (too many
files...)
_ I can do it using a LineEntityProcessor reading a list of files,
generated externally, but I would prefer to generate the list in SOLR
_ So to avoid to mantain a list of files, I'm trying to generate the list
with an sql query, and to give the list of results to XPathEntityProcessor,
which will read the file

The query select DISTINCT... generate this result
CHEMINRELATIF
3/0/000/3001

But the problem is that with the following configuration, no request do db
is done, accoring to the message returned by DIH.

 statusMessages:{
Total Requests made to DataSource:0,
Total Rows Fetched:0,
Total Documents Processed:0,
Total Documents Skipped:0,
:Indexing completed. Added/Updated: 0 documents. Deleted 0
documents.,
Committed:2013-05-30 10:23:30,
Optimized:2013-05-30 10:23:30,

And the log:
INFO 2013-05-30 10:23:29,924 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
Configuration: mnb-data-config.xml
INFO 2013-05-30 10:23:29,957 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
loaded successfully
INFO 2013-05-30 10:23:29,969 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
Import
INFO 2013-05-30 10:23:30,009 http-8080-1
org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
dataimportMNb.properties
INFO 2013-05-30 10:23:30,045 http-8080-1
org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
successfully


Did some has already done the kind of configuration, or is just not
possible?

The config:
dataConfig
dataSource name=accesPCN type=JdbcDataSource
driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser
password=mypasswd readOnly=true/
document
entity name=requeteurNomsFichiersNotices
datasource=accesPCN
processor=SqlEntityProcessor
query=select DISTINCT...
SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) ||
'/' ||
SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) ||
'/' ||
SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) ||
'/' ||
to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
from bnf.noticebib
where numnoticebib = '3001'
transformer=LogTransformer 
logTemplate=In
entity requeteurNomsFichiersNotices logLevel=debug

entity  
name=processorDocument
processor=XPathEntityProcessor
url=file:///D:/jed/noticesBib/$
{accesPCN.CHEMINRELATIF}
xsl=xslt/mnb/IXM_MNb.xsl
forEach=/record

transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer
logTemplate=Notice fichier: $
{accesPCN.CHEMINRELATIF} logLevel=debug
datasource=accesPCN

I'm trying to inde
Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement.

Re: Query syntax error: Cannot parse ....

2013-05-30 Thread Yago Riveiro

Hi, 

Indeed, with character # encoded the query works fine.

Thanks

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, May 29, 2013 at 9:43 PM, bbarani wrote:

 # has a separate meaning in URL.. You need to encode that..
 
 http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-syntax-error-Cannot-parse-tp4066560p4066879.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: Sorting results by last update date

2013-05-30 Thread Kamal Palei

Thanks Shalini...
It is solr 3.6.2
Instead of NOW, I can use today's date (I did not know this cache
issue,, thanks).

Later I realized , it looks it is my mistake that misleads asc and desc
ordering result.
 After I get data from solr, again I do mysql query where the order changes
again.

Regards
Kamal


On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com
 wrote:

  Hi All
  I am trying to sort the results as per last updated date. My url looks as
  below.
 
  *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
  588]fq=salary:[0 TO 500] OR
 
 
 salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
 
 
 +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
 
 
 +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
  json.nl=mapsort=last_updated_date asc
  *
  With this I get the data in ascending order of last updated date.
 
  If I am trying to sort data in descending order, I use below url
 
  *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
  588]fq=salary:[0 TO 500] OR
 
 
 salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java
 
 
 +sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java
 
 
 +sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
  json.nl=mapsort=last_updated_date desc*
 
  Here the data set is not ordered properly, mostly it looks to me data is
  ordered on basis of score, not last updated date.
 
  Can somebody tell me what I am missing here, why *desc* is not working
  properly for me.
 
 
 What is the field type of last_update_date? Which version of Solr?

 A side note: Using NOW in a filter query is ineffecient because it doesn't
 use your filter cache effectively. Round it to nearest time interval
 instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Sorting results by last update date

2013-05-30 Thread Tom Gullo

sort=last_updated_date desc

Maybe adding %20 will help:

sort=last_updated_date%20desc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-by-last-update-date-tp4066692p4066986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

Did you declare that field name in outer entity? Not just select as in
the query.

Regards,
  Alex
On 30 May 2013 04:31, jerome.dup...@bnf.fr wrote:


 Hello,

 I want to use a index a huge list of xml file.
 _ Using FileListEntityProcessor causes an OutOfMemoryException (too many
 files...)
 _ I can do it using a LineEntityProcessor reading a list of files,
 generated externally, but I would prefer to generate the list in SOLR
 _ So to avoid to mantain a list of files, I'm trying to generate the list
 with an sql query, and to give the list of results to XPathEntityProcessor,
 which will read the file

 The query select DISTINCT... generate this result
 CHEMINRELATIF
 3/0/000/3001

 But the problem is that with the following configuration, no request do db
 is done, accoring to the message returned by DIH.

  statusMessages:{
 Total Requests made to DataSource:0,
 Total Rows Fetched:0,
 Total Documents Processed:0,
 Total Documents Skipped:0,
 :Indexing completed. Added/Updated: 0 documents. Deleted 0
 documents.,
 Committed:2013-05-30 10:23:30,
 Optimized:2013-05-30 10:23:30,

 And the log:
 INFO 2013-05-30 10:23:29,924 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
 Configuration: mnb-data-config.xml
 INFO 2013-05-30 10:23:29,957 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
 loaded successfully
 INFO 2013-05-30 10:23:29,969 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
 Import
 INFO 2013-05-30 10:23:30,009 http-8080-1
 org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
 dataimportMNb.properties
 INFO 2013-05-30 10:23:30,045 http-8080-1
 org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
 successfully


 Did some has already done the kind of configuration, or is just not
 possible?

 The config:
 dataConfig
 dataSource name=accesPCN type=JdbcDataSource
 driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser
 password=mypasswd readOnly=true/
 document
 entity name=requeteurNomsFichiersNotices
 datasource=accesPCN
 processor=SqlEntityProcessor
 query=select DISTINCT...
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 3,
 1) ||
 '/' ||
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 4,
 1) ||
 '/' ||
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 5,
 3) ||
 '/' ||
 to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
 from bnf.noticebib
 where numnoticebib = '3001'
 transformer=LogTransformer
 logTemplate=In
 entity requeteurNomsFichiersNotices logLevel=debug
 
 entity
  name=processorDocument

 processor=XPathEntityProcessor

 url=file:///D:/jed/noticesBib/$
 {accesPCN.CHEMINRELATIF}
 xsl=xslt/mnb/IXM_MNb.xsl
 forEach=/record

 transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer
 logTemplate=Notice
 fichier: $
 {accesPCN.CHEMINRELATIF} logLevel=debug
 datasource=accesPCN
 
 I'm trying to inde
 Cordialement,
 ---
 Jérôme Dupont
 Bibliothèque Nationale de France
 Département des Systèmes d'Information
 Tour T3 - Quai François Mauriac
 75706 Paris Cedex 13
 téléphone: 33 (0)1 53 79 45 40
 e-mail: jerome.dup...@bnf.fr
 ---

 Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet
 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez
 à l'environnement.

Re: Automatic cross linking

Do it outside of solr or look at update request processors. E.g. UIMA
integration as an example.

Regards,
Alex
On 30 May 2013 02:52, It-forum it-fo...@meseo.fr wrote:

 Hello,

 I'm looking to use Solr for creating cross linking in text.

 For exemple : I'll like to be able to request for a text field, an
 article, in my blog. And that Solr use a script/method, request to parse
 the text, find all matching categories term and caps the results.

 Do you have any suggestion, documentation, tutorial, source code :), that
 could help me to realise this optimisation.

 Regards.

 David

SPLITSHARD: time out error

2013-05-30 Thread yriveiro

Hi,

I have a time out error when I try to split a collection with 15M documents

The exception (solr version 4.3):

542468 [catalina-exec-27] INFO  org.apache.solr.servlet.SolrDispatchFilter 
– [admin] webapp=null path=/admin/collections
params={shard=00action=SPLITSHARDcollection=ST-0112_replicated} status=500
QTime=300028
542469 [catalina-exec-27] ERROR org.apache.solr.servlet.SolrDispatchFilter 
– null:org.apache.solr.common.SolrException: splitshard the collection time
out:300s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166)
at
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

582557 [catalina-exec-39] INFO  org.apache.solr.update.SolrIndexSplitter  –
SolrIndexSplitter: partition #1
582561 [catalina-exec-39] INFO  org.apache.solr.core.SolrCore  –
SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/disk2/node00.solrcloud/solr/home/0112_replicated_00_1_replica1/data/index,segFN=segments_1,generation=1,filenames=[segments_1]
582563 [catalina-exec-39] INFO  org.apache.solr.core.SolrCore  – newest
commit = 1[segments_1]

How I can split my collection without this error?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SPLITSHARD-time-out-error-tp4066991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple field join?

2013-05-30 Thread Erick Erickson

Solr Join is _not_ sql subquery and won't work like one.
There's a reason it's called pseudo join in the JIRA issues.

My advice. Forget joins and try to write this in pure
Solr query language. The more you try to use Solr like
a database, the more you'll get into trouble. De-normalize
your data and try again.


Best
Erick

On Wed, May 29, 2013 at 10:34 PM, cmd.ares cmd.a...@gmail.com wrote:
 http://wiki.apache.org/solr/Join
 I found solr join is actually  sql subquery,does solr  support 3 tables jion
 ? the sql like this
 SELECT xxx, yyy
 FROM collection1
 WHERE
 outer_id IN (SELECT inner_id FROM collection1 where zzz = vvv)
 and
 outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = xxx)
 and
 outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = rrr)

 how to write the solr request url?
 thanks.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/multiple-field-join-tp4066930.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not able to search Spanish word with ascent in solr

2013-05-30 Thread Erick Erickson

Deep:

Have you looked through the rest of the thread and tried the
suggestions? If so, what were the results?

Best
Erick

On Thu, May 30, 2013 at 2:45 AM, Deep Lotia deeplo...@gmail.com wrote:
 Hi,

 I am having a same kind of issue. I am not able to search accented characters
 of spanish. For eg: - Según, próximos etc.

 I have field called attr_content which holds the content of a PDF file whose
 contents are in spanish. I am using Apache Tika to index the contents of a PDF
 file. I have wrote a java class which using the Apache Tika classes to read
 the PDF contents and index it to solr 3.5.

 Anything which can be missed? Is it be because of encoding issues.

 Please help.

 Deep

Removing a single value from a multiValue field

I have a Solr application with a multiValue field 'tags'. All fields
are indexed in this application. There exists a uniqueKey field 'id'
and a '_version_' field. This is running on Solr 4.x.

In order to add a tag, the application retrieves the full document,
creates a PHP array from the document structure, removes the
'_version_' field, and then adds the appropriate tag to the 'tags'
array. This is all then sent to Solr's update method via HTTP with
'overwrite=true'. Solr correctly replaces the extant document with the
new document, which is identical with the exception of a new value for
the '_version_' field and an additional value in the multiValued field
'tags'. This all works correctly.

I am now adding a feature where one can remove tags. I am using the
same business logic, however instead of adding a value to the 'tags'
array I am removing one. I can confirm that the data being sent to
Solr does not contain the removed tag. However, it seems that the old
value for the multiValue field is persisted, that is the old tag
stays. I can see that the '_version_' field has a new value, so I see
that the change was properly commited.

Is there a known bug that overwriting such a doc...:
doc
arr name=tags
stra/str
strb/str
 /arr
/doc

...with this doc...:
doc
arr name=tags
stra/str
 /arr
/doc

...has no effect? Can multiValue fields be only added, but not removed?

Thanks.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Problem with PatternReplaceCharFilter

Just count the character in the literal portions of the patterns and include 
that spaces in the replacement.


So, TextLine  would become   .

It gets trickier if names are variable length. But I'm sure you could come 
up with patterns to replace one, two, three, etc. char names with equivalent 
spaces.


But... if all of this is too difficult for you, some people might find it 
easier to preprocess the data before sending it to Solr.


I mean, do you really need to highlight the content in such a cryptic input 
format?


Ultimately you might be better off with a custom char filter - sometimes 
people can cope better with straight Java code than cryptic regular 
expression sequences.


-- Jack Krupansky

-Original Message- 
From: jasimop

Sent: Thursday, May 30, 2013 12:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Problem with PatternReplaceCharFilter

Honestly, I have no idea how to do that.
PatternReplaceCharFilter doesn't seem to have a parameter like
preservePositions=true and
optionally fillCharacter= .
And I don't think I can express this simply as regex. How would I count in a
pure
regex the length difference before and after the match?

Well, the specific problem is, that when highlighting the term positions are
wrong and the
result is not a valid XML structure that I can handle.
I expect something like
TextLine aa=quot;bbquot; cc=quot;ddquot; content=quot;the content to
lt;emsearch/em in ee=ff /
but I can
Texlt;emtLine/emaa=bb cc=dd content=the content to emsearch/em
in ee=ff /

Thanks for your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-PatternReplaceCharFilter-tp4066869p4066939.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Support for Mongolian language

No, there is not.

-- Jack Krupansky

-Original Message- 
From: Sagar Chaturvedi

Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html

Sent from the Solr - User mailing list archive at Nabble.com.

DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect 
the

opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail 
is

strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

RE: Upgrade Solr index from 4.0 to 4.2.1

2013-05-30 Thread Elran Dvir

So having tried all combinations of LUCENE_40, 41 and 42 we're still having no 
success in getting our indexes to load with Solr 4.2.1...

Any direction we can look into ? in our system the underlying data is very slow 
to re-index and would take an unreasonable amount of time at a customer site to 
wait for information to become available after an upgrade, so we're very 
hopeful there can be a way to upgrade a Lucene index properly.

Thanks,
Elran

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, May 22, 2013 2:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Upgrade Solr index from 4.0 to 4.2.1

LUCENE_40 since your original index was built with 4.0.

As for the other, I'll defer to people who actually know what they're talking 
about.

Best
Erick

On Wed, May 22, 2013 at 5:19 AM, Elran Dvir elr...@checkpoint.com wrote:
 My index is originally of version 4.0. My methods failed with this 
 configuration.
 So, I changed  solrconfig.xml  in my index to both versions: LUCENE_42 and 
 LUCENE_41.
 For each version in each method (loading and IndexUpgrader), I see the same 
 errors as before.

 Thanks.

 -Original Message-
 From: Elran Dvir
 Sent: Tuesday, May 21, 2013 6:48 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Upgrade Solr index from 4.0 to 4.2.1

 Why LUCENE_42?Why not LUCENE_41?
 Do I still need to run IndexUpgrader or just loading will be enough?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, May 21, 2013 2:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Upgrade Solr index from 4.0 to 4.2.1

 This is always something that gives me a headache, but what happens if you 
 change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's 
 LUCENE_42...

 Best
 Erick

 On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote:
 Hi all,

 I have a 4.0 Solr (sharded/cored) index.
 I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got 
 the following exception:

 May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log
 SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: 
 other_2013-05-04
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:779)
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.init(SolrCore.java:822)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
 ... 10 more
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435)
 at 
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:797)
 ... 13 more
 Caused by: org.apache.solr.common.SolrException: Error opening Reader
 at 
 org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179)
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411)
 ... 15 more
 Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: 
 actual codec=Lucene40StoredFieldsIndex vs expected 
 codec=Lucene41StoredFieldsIndex (resource: 
 MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx))
 at 
 org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140)
 at

Re: Sorting results by last update date


You can just use NOW/DAY for a filter that would only change once a day:

[NOW/DAY-60DAY TO NOW/DAY]

Oops... make that:

[NOW/DAY-60DAY TO NOW/DAY+1DAY]

Otherwise, it would miss dates after the start of today.

Even better, make it:

[NOW/DAY-60DAY TO *]

-- Jack Krupansky

-Original Message- 
From: Kamal Palei

Sent: Thursday, May 30, 2013 5:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting results by last update date

Thanks Shalini...
It is solr 3.6.2
Instead of NOW, I can use today's date (I did not know this cache
issue,, thanks).

Later I realized , it looks it is my mistake that misleads asc and desc
ordering result.
After I get data from solr, again I do mysql query where the order changes
again.

Regards
Kamal


On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com
wrote:

 Hi All
 I am trying to sort the results as per last updated date. My url looks 
 as

 below.

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR


salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java


+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java


+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date asc
 *
 With this I get the data in ascending order of last updated date.

 If I am trying to sort data in descending order, I use below url

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR


salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java


+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java


+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date desc*

 Here the data set is not ordered properly, mostly it looks to me data is
 ordered on basis of score, not last updated date.

 Can somebody tell me what I am missing here, why *desc* is not working
 properly for me.


What is the field type of last_update_date? Which version of Solr?

A side note: Using NOW in a filter query is ineffecient because it doesn't
use your filter cache effectively. Round it to nearest time interval
instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter

--
Regards,
Shalin Shekhar Mangar.

Re: Reindexing strategy

On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:
 It's impossible for us to give you hard numbers.  You'll have to
 experiment to know how fast you can reindex without killing your
 servers.  A basic tenet for such experimentation, and something you
 hopefully already know: You'll want to get baseline measurements before
 you begin testing for comparison.


Thanks. I wan't looking for hard numbers, but rather am looking for
what are the signs of problems. I know to keep my eye on memory and
CPU, but I have no idea how to check disk I/O, and I'm not sure how to
determine even if that becomes saturated.

 One of the most reliable Solr-specific indicators of pushing your
 hardware too hard is that the QTime on your queries will start to
 increase dramatically.  Solr 4.1 and later has more granular query time
 statistics in the UI - the median and 95% numbers are much more
 important than the average.


Thank you, this will help. At least I now have a hard metric to see
when Solr is getting overburdened (QTime).


 Outside of that, if your overall IOwait CPU percentage starts getting
 near (or above) 30-50%, your server is struggling.  If all of your CPU
 cores are staying near 100% usage, then it's REALLY struggling.


I see, thanks.


 Assuming you have plenty of CPU cores, using fast storage and having
 plenty of extra RAM will alleviate much of the I/O bottleneck.  The
 usual rule of thumb for good query performance is that you need enough
 RAM to put 50-100% of your index in the OS disk cache.  For blazing
 performance during a rebuild, that becomes 100-200%.  If you had 150%,
 that would probably keep most indexes well-cached even during a rebuild.

 A rebuild will always lower performance, even with lots of RAM.


Considering that the Solr index is the only place that the data is
stored, and that users are actively using the system, I was not
planning on a rebuild but rather to iteratively reindex the extant
documents, even as new documents are being push in.


 My earlier reply to your other message has some other ideas that will
 hopefully help.


Thank you Shawn!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: What exactly happens to extant documents when the schema changes?

On Wed, May 29, 2013 at 5:09 PM, Shawn Heisey s...@elyograg.org wrote:
 I handle this in a very specific way with my sharded index.  This won't
 work for all designs, and the precise procedure won't work for SolrCloud.

 There is a 'live' and a 'build' core for each of my shards.  When I want
 to reindex, the program makes a note of my current position for deletes,
 reinserts, and new documents.  Then I use a DIH full-import from mysql
 into the build cores.  Once the import is done, I run the update cycle
 of deletes, reinserts, and new documents on those build cores, using the
 position information noted earlier.  Then I swap the cores so the new
 index is online.


I do need to examine sharding and multiple cores. I'll look into that,
thank you. By the way, don't google for DIH! It took me some time to
figure out that it is DataImportHandler, as some people use the
acronym for something completely different.


 To adapt this for SolrCloud, I would need to use two collections, and
 update a collection alias for what is considered live.

 To control the I/O and CPU usage, you might need some kind of throttling
 in your update/rebuild application.

 I don't need any throttling in my design.  Because I'm using DIH, the
 import only uses a single thread for each shard on the server.  I've got
 RAID10 for storage and half of the CPU cores are still available for
 queries, so it doesn't overwhelm the server.

 The rebuild does lower performance, so I have the other copy of the
 index handle queries while the rebuild is underway.  When the rebuild is
 done on one copy, I run it again on the other copy.  Right now I'm
 half-upgraded -- one copy of my index is version 3.5.0, the other is
 4.2.1.  Switching to SolrCloud with sharding and replication would
 eliminate this flexibility, unless I maintained two separate clouds.


Thank you. I am not using Solr Cloud but if I ever consider it, then I
will keep this in mind.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Removing a single value from a multiValue field

First, you cannot do any internal editing of a multi-valued list, other 
than:


1. Replace the entire list.
2. Add values on to the end of the list.

But you can do both of those operations on a single multivalued field with 
atomic update without reading and writing the entire document.


Second, there is no arr element in the Solr Update XML format. Only 
field.


To simply replace the full, current value of one multi-valued field:

add
 doc
   field name=iddoc-id/field
   field name=tags update=seta/field
   field name=tags update=setb/field
 /doc
/add

If you simply want to append a couple of values:

add
 doc
   field name=iddoc-id/field
   field name=tags update=adda/field
   field name=tags update=addb/field
 /doc
/add

To empty out a multivalued field:

add
 doc
   field name=iddoc-id/field
   field name=tags update=set null=true /
 /doc
/add

-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Thursday, May 30, 2013 7:55 AM
To: solr-user@lucene.apache.org
Subject: Removing a single value from a multiValue field

I have a Solr application with a multiValue field 'tags'. All fields
are indexed in this application. There exists a uniqueKey field 'id'
and a '_version_' field. This is running on Solr 4.x.

In order to add a tag, the application retrieves the full document,
creates a PHP array from the document structure, removes the
'_version_' field, and then adds the appropriate tag to the 'tags'
array. This is all then sent to Solr's update method via HTTP with
'overwrite=true'. Solr correctly replaces the extant document with the
new document, which is identical with the exception of a new value for
the '_version_' field and an additional value in the multiValued field
'tags'. This all works correctly.

I am now adding a feature where one can remove tags. I am using the
same business logic, however instead of adding a value to the 'tags'
array I am removing one. I can confirm that the data being sent to
Solr does not contain the removed tag. However, it seems that the old
value for the multiValue field is persisted, that is the old tag
stays. I can see that the '_version_' field has a new value, so I see
that the change was properly commited.

Is there a known bug that overwriting such a doc...:
doc
   arr name=tags
   stra/str
   strb/str
/arr
/doc

...with this doc...:
doc
   arr name=tags
   stra/str
/arr
/doc

...has no effect? Can multiValue fields be only added, but not removed?

Thanks.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Sorting results by last update date

I wrote Otherwise, it would miss dates after the start of today, but that 
should be Otherwise, it would miss documents with times after the start of 
today if the current time is before noon.


But use * and you will be better off anyway.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Thursday, May 30, 2013 8:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting results by last update date

You can just use NOW/DAY for a filter that would only change once a day:

[NOW/DAY-60DAY TO NOW/DAY]

Oops... make that:

[NOW/DAY-60DAY TO NOW/DAY+1DAY]

Otherwise, it would miss dates after the start of today.

Even better, make it:

[NOW/DAY-60DAY TO *]

-- Jack Krupansky

-Original Message- 
From: Kamal Palei

Sent: Thursday, May 30, 2013 5:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting results by last update date

Thanks Shalini...
It is solr 3.6.2
Instead of NOW, I can use today's date (I did not know this cache
issue,, thanks).

Later I realized , it looks it is my mistake that misleads asc and desc
ordering result.
After I get data from solr, again I do mysql query where the order changes
again.

Regards
Kamal


On Wed, May 29, 2013 at 2:54 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


On Wed, May 29, 2013 at 12:10 PM, Kamal Palei palei.ka...@gmail.com
wrote:

 Hi All
 I am trying to sort the results as per last updated date. My url looks 
 as

 below.

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR


salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java


+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java


+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date asc
 *
 With this I get the data in ascending order of last updated date.

 If I am trying to sort data in descending order, I use below url

 *fq=last_updated_date:[NOW-60DAY TO NOW]fq=experience:[0 TO
 588]fq=salary:[0 TO 500] OR


salary:0fq=-bundle:jobfq=-bundle:panelfq=-bundle:pagefq=-bundle:articlespellcheck=trueq=+java


+sipfl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,zm_parent_entity,ss_filemime,ss_file_entity_title,ss_file_entity_url,ss_field_uidspellcheck.q=+java


+sipqf=content^40qf=label^5.0qf=tos_content_extra^0.1qf=tos_name^3.0hl.fl=contentmm=1q.op=ANDwt=json
 json.nl=mapsort=last_updated_date desc*

 Here the data set is not ordered properly, mostly it looks to me data is
 ordered on basis of score, not last updated date.

 Can somebody tell me what I am missing here, why *desc* is not working
 properly for me.


What is the field type of last_update_date? Which version of Solr?

A side note: Using NOW in a filter query is ineffecient because it doesn't
use your filter cache effectively. Round it to nearest time interval
instead. See http://java.dzone.com/articles/solr-date-math-now-and-filter

--
Regards,
Shalin Shekhar Mangar.

Re: Problem with xpath expression in data-config.xml

2013-05-30 Thread Shalin Shekhar Mangar

Ah, I missed that part.

The problem that you have is because you have forEach=/feed/entry but you
want to read /feed/link as a common field. You need to have forEach=/feed
| /feed/entry which should let you have both /feed/link as well as
/feed/entry/link.


On Thu, May 30, 2013 at 1:25 PM, Hans-Peter Stricker
stric...@epublius.dewrote:

 Thanks for having analyzed the problem. But please let me note that I came
 to a somehow different conclusion.

 Define for the moment title to be the primary unique key:

 solr-4.3.0\example\example-DIH\solr\rss\conf\schema.xml

 uniqueKeytitle/uniqueKey

 solr-4.3.0\example\example-DIH\solr\rss\conf\rss-data-config.xml

 [BAD CASE] (irrespective of the predicate @rel='self')
 dataConfig
 dataSource type=URLDataSource /
 document
 entity name=beautybooks88  pk=title url=
 http://beautybooks88.blogspot.com/feeds/posts/default;
 processor=XPathEntityProcessor forEach=/feed/entry
 transformer=DateFormatTransformer
 field column=title xpath=/feed/entry/title /
 field column=source-link
 xpath=/feed/link[@rel='self']/@href commonField=true /
 /entity
 /document
 /dataConfig

 [GOOD CASE]
 dataConfig
 dataSource type=URLDataSource /
 document
 entity name=beautybooks88  pk=title url=
 http://beautybooks88.blogspot.com/feeds/posts/default;
 processor=XPathEntityProcessor forEach=/feed/entry
 transformer=DateFormatTransformer
 field column=title xpath=/feed/entry/title /
 field column=link
 xpath=/feed/entry/link[@rel='self']/@href /
 /entity
 /document
 /dataConfig

 Conclusion: It has nothing to do with the number of occurrences of the
 pattern.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Removing a single value from a multiValue field

On Thu, May 30, 2013 at 3:42 PM, Jack Krupansky j...@basetechnology.com wrote:
 First, you cannot do any internal editing of a multi-valued list, other
 than:

 1. Replace the entire list.
 2. Add values on to the end of the list.


Thank you. I meant that I am actually editing the entire document.
Reading it, changing the values that I need, and then 'updating' it. I
will look into updating only the single multiValued field.


 But you can do both of those operations on a single multivalued field with
 atomic update without reading and writing the entire document.

 Second, there is no arr element in the Solr Update XML format. Only
 field.

 To simply replace the full, current value of one multi-valued field:

 add
  doc
field name=iddoc-id/field
field name=tags update=seta/field
field name=tags update=setb/field
  /doc
 /add

 If you simply want to append a couple of values:

 add
  doc
field name=iddoc-id/field
field name=tags update=adda/field
field name=tags update=addb/field
  /doc
 /add

 To empty out a multivalued field:

 add
  doc
field name=iddoc-id/field
field name=tags update=set null=true /
  /doc
 /add


Thank you. I will see about translating that into the JSON format that
I work with.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Removing a single value from a multiValue field


You gave an XML example, so I assumed you were working with XML!

In JSON...

[{id: doc-id, tags: {add: [a, b]}]

and

[{id: doc-id, tags: {set: null}}]

BTW, this kind of stuff is covered in the book, separate chapters for XML 
and JSON, each with dozens of examples like this.


-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Thursday, May 30, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Removing a single value from a multiValue field

On Thu, May 30, 2013 at 3:42 PM, Jack Krupansky j...@basetechnology.com 
wrote:

First, you cannot do any internal editing of a multi-valued list, other
than:

1. Replace the entire list.
2. Add values on to the end of the list.



Thank you. I meant that I am actually editing the entire document.
Reading it, changing the values that I need, and then 'updating' it. I
will look into updating only the single multiValued field.



But you can do both of those operations on a single multivalued field with
atomic update without reading and writing the entire document.

Second, there is no arr element in the Solr Update XML format. Only
field.

To simply replace the full, current value of one multi-valued field:

add
 doc
   field name=iddoc-id/field
   field name=tags update=seta/field
   field name=tags update=setb/field
 /doc
/add

If you simply want to append a couple of values:

add
 doc
   field name=iddoc-id/field
   field name=tags update=adda/field
   field name=tags update=addb/field
 /doc
/add

To empty out a multivalued field:

add
 doc
   field name=iddoc-id/field
   field name=tags update=set null=true /
 /doc
/add



Thank you. I will see about translating that into the JSON format that
I work with.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Solr 4.3, Tomcat, Error filterStart


I am trying to get Solr installed in Tomcat, and having trouble.

I am trying to use the instructions at 
http://wiki.apache.org/solr/SolrTomcat as a guide.  Trying to start with 
the example Solr from the Solr distro. Tried using the Tried with both a 
binary distro with existing solr.war, and with compiling my own solr.war.


* Solr 4.3.0
* Tomcat 6.0.29
* JVM 1.6

When I start up tomcat, I get in the Tomcat log:


INFO: Deploying web application archive solr.war
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors


And solr is not actually deployed, naturally.

I've tried to google for advice on this -- mostly what I found was 
suggestions for how to turn up logging to get more info (maybe a stack 
trace?) to give you more clues what's failing -- but nothing I found 
suggested succesfully worked to turn up logging.


So I'm at a bit of a loss. Any suggestions? Any ideas what might be 
causing this error, and/or how to get more information on what's causing it?

Re: Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Steve Rowe

Hi Jonathan,

Did you find 
http://stackoverflow.com/questions/3016808/tomcat-startup-logs-severe-error-filterstart-how-to-get-a-stack-trace
 ?

Steve

On May 30, 2013, at 10:10 AM, Jonathan Rochkind rochk...@jhu.edu wrote:

 I am trying to get Solr installed in Tomcat, and having trouble.
 
 I am trying to use the instructions at http://wiki.apache.org/solr/SolrTomcat 
 as a guide.  Trying to start with the example Solr from the Solr distro. 
 Tried using the Tried with both a binary distro with existing solr.war, and 
 with compiling my own solr.war.
 
 * Solr 4.3.0
 * Tomcat 6.0.29
 * JVM 1.6
 
 When I start up tomcat, I get in the Tomcat log:
 
 
 INFO: Deploying web application archive solr.war
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Error filterStart
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Context [/solr] startup failed due to previous errors
 
 
 And solr is not actually deployed, naturally.
 
 I've tried to google for advice on this -- mostly what I found was 
 suggestions for how to turn up logging to get more info (maybe a stack 
 trace?) to give you more clues what's failing -- but nothing I found 
 suggested succesfully worked to turn up logging.
 
 So I'm at a bit of a loss. Any suggestions? Any ideas what might be causing 
 this error, and/or how to get more information on what's causing it?

Re: Solr 4.3, Tomcat, Error filterStart

Usually tomcat errors with Solr 4.3 happen due to uncopied logging
libraries. I would check if installing Solr 4.2.1 works and/or copy
additional libraries in (search mailing list for this issue).

However, I am not entirely sure that's the case here. It feels that
perhaps the definition of the handler could be a bigger issue here. I
assume you have an xml file somewhere that defines that /solr maps to
solr.war. I would double check that. Maybe try to deploy something
smaller and easier and see what the difference is.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:10 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 I am trying to get Solr installed in Tomcat, and having trouble.

 I am trying to use the instructions at
 http://wiki.apache.org/solr/SolrTomcat as a guide.  Trying to start with the
 example Solr from the Solr distro. Tried using the Tried with both a binary
 distro with existing solr.war, and with compiling my own solr.war.

 * Solr 4.3.0
 * Tomcat 6.0.29
 * JVM 1.6

 When I start up tomcat, I get in the Tomcat log:


 INFO: Deploying web application archive solr.war
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Error filterStart
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Context [/solr] startup failed due to previous errors


 And solr is not actually deployed, naturally.

 I've tried to google for advice on this -- mostly what I found was
 suggestions for how to turn up logging to get more info (maybe a stack
 trace?) to give you more clues what's failing -- but nothing I found
 suggested succesfully worked to turn up logging.

 So I'm at a bit of a loss. Any suggestions? Any ideas what might be causing
 this error, and/or how to get more information on what's causing it?

Fwd: indexing only selected fields

-- Forwarded message --
From: Igor Littig igor.lit...@gmail.com
Date: 2013/5/30
Subject: indexing only selected fields
To: solr-user-...@lucene.apache.org

Hello everyone.

I'm quite new in Solr and need your advice... Does anybody know how to
index not all fields in an uploading document but only those which I
mentioned in the schema, others fields and symbols just ignore. Is it
possible ???

Re: Solr 4.3, Tomcat, Error filterStart

 I am trying to get Solr installed in Tomcat, and having trouble.


 When I start up tomcat, I get in the Tomcat log:


 INFO: Deploying web application archive solr.war
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Error filterStart
 May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
 SEVERE: Context [/solr] startup failed due to previous errors


 I've tried to google for advice on this -- mostly what I found was
 suggestions for how to turn up logging to get more info

In a cruel twist of fate, it is actually logging changes that are
preventing Solr from starting. The required steps for deploying 4.3
changed. I will update the wiki page about tomcat when I'm not on a train.
 See this page for additional instructions, specifically the section about
deploying on containers other than jetty:

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

Re: indexing only selected fields

How are you submitting your document? Some methods automatically
ignore unknown fields, other complaint.

In any case, there is always a way to define an ignored field type.
The schema.xml in the main example shows how to do it. Search for
'ignored'. But beware that this will hide all spelling and other
errors later..

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com wrote:
 -- Forwarded message --
 From: Igor Littig igor.lit...@gmail.com
 Date: 2013/5/30
 Subject: indexing only selected fields
 To: solr-user-...@lucene.apache.org


 Hello everyone.

 I'm quite new in Solr and need your advice... Does anybody know how to
 index not all fields in an uploading document but only those which I
 mentioned in the schema, others fields and symbols just ignore. Is it
 possible ???

Re: Fwd: indexing only selected fields

 -- Forwarded message --
 From: Igor Littig igor.lit...@gmail.com
 Date: 2013/5/30
 Subject: indexing only selected fields
 To: solr-user-...@lucene.apache.org

 Hello everyone.

 I'm quite new in Solr and need your advice... Does anybody know how to
 index not all fields in an uploading document but only those which I
 mentioned in the schema, others fields and symbols just ignore. Is it
 possible ???

This should be exactly how Solr works. The only way that you would get
fields not explicitly mentioned in your schema is if they match a dynamic
field wildcard ... but that would also be in your schema, so it doesn't
change what I'm saying.

Thanks,
Shawn

Re: indexing only selected fields

Alex

Thank you for the answer. I am submitting by POST method via curl... For
example when I want to submit a document I'm typing in the command line:

curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @
base.info -H 'Content-type:application/json'

where base.info my file with information which I want to index. Could you
in which ways(methods) I can automatically omit unknown fields. It would be
easier to select only needed fields.

Cheers
Igor


2013/5/30 Alexandre Rafalovitch arafa...@gmail.com

 How are you submitting your document? Some methods automatically
 ignore unknown fields, other complaint.

 In any case, there is always a way to define an ignored field type.
 The schema.xml in the main example shows how to do it. Search for
 'ignored'. But beware that this will hide all spelling and other
 errors later..

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com
 wrote:
  -- Forwarded message --
  From: Igor Littig igor.lit...@gmail.com
  Date: 2013/5/30
  Subject: indexing only selected fields
  To: solr-user-...@lucene.apache.org
 
 
  Hello everyone.
 
  I'm quite new in Solr and need your advice... Does anybody know how to
  index not all fields in an uploading document but only those which I
  mentioned in the schema, others fields and symbols just ignore. Is it
  possible ???

RE: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-30 Thread Dyer, James

I don't want to dissuade you from trying but I believe FileListEntityProcessor 
has something special coded up into it to allow for its unique usage.  Not sure 
if your approach isn't do-able.  I would imagine that fixing FLEP to handle a 
row-at-a-time or page-at-a-time in memory wouldn't be terribly hard, but 
haven't looked either.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, May 30, 2013 6:08 AM
To: solr-user@lucene.apache.org
Subject: Re: [DIH] Using SqlEntity to get a list of files and read files in 
XpathEntityProcessor

Did you declare that field name in outer entity? Not just select as in
the query.

Regards,
  Alex
On 30 May 2013 04:31, jerome.dup...@bnf.fr wrote:


 Hello,

 I want to use a index a huge list of xml file.
 _ Using FileListEntityProcessor causes an OutOfMemoryException (too many
 files...)
 _ I can do it using a LineEntityProcessor reading a list of files,
 generated externally, but I would prefer to generate the list in SOLR
 _ So to avoid to mantain a list of files, I'm trying to generate the list
 with an sql query, and to give the list of results to XPathEntityProcessor,
 which will read the file

 The query select DISTINCT... generate this result
 CHEMINRELATIF
 3/0/000/3001

 But the problem is that with the following configuration, no request do db
 is done, accoring to the message returned by DIH.

  statusMessages:{
 Total Requests made to DataSource:0,
 Total Rows Fetched:0,
 Total Documents Processed:0,
 Total Documents Skipped:0,
 :Indexing completed. Added/Updated: 0 documents. Deleted 0
 documents.,
 Committed:2013-05-30 10:23:30,
 Optimized:2013-05-30 10:23:30,

 And the log:
 INFO 2013-05-30 10:23:29,924 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
 Configuration: mnb-data-config.xml
 INFO 2013-05-30 10:23:29,957 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
 loaded successfully
 INFO 2013-05-30 10:23:29,969 http-8080-1
 org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
 Import
 INFO 2013-05-30 10:23:30,009 http-8080-1
 org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
 dataimportMNb.properties
 INFO 2013-05-30 10:23:30,045 http-8080-1
 org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
 successfully


 Did some has already done the kind of configuration, or is just not
 possible?

 The config:
 dataConfig
 dataSource name=accesPCN type=JdbcDataSource
 driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@mymachine:myport:mydb user=myuser
 password=mypasswd readOnly=true/
 document
 entity name=requeteurNomsFichiersNotices
 datasource=accesPCN
 processor=SqlEntityProcessor
 query=select DISTINCT...
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 3,
 1) ||
 '/' ||
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 4,
 1) ||
 '/' ||
 SUBSTR( to_char(noticebib.numnoticebib, '9'), 5,
 3) ||
 '/' ||
 to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
 from bnf.noticebib
 where numnoticebib = '3001'
 transformer=LogTransformer
 logTemplate=In
 entity requeteurNomsFichiersNotices logLevel=debug
 
 entity
  name=processorDocument

 processor=XPathEntityProcessor

 url=file:///D:/jed/noticesBib/$
 {accesPCN.CHEMINRELATIF}
 xsl=xslt/mnb/IXM_MNb.xsl
 forEach=/record

 transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer
 logTemplate=Notice
 fichier: $
 {accesPCN.CHEMINRELATIF} logLevel=debug
 datasource=accesPCN
 
 I'm trying to inde
 Cordialement,
 ---
 Jérôme Dupont
 Bibliothèque Nationale de France
 Département des Systèmes d'Information
 Tour T3 - Quai François Mauriac
 75706 Paris Cedex 13
 téléphone: 33 (0)1 53 79 45 40
 e-mail: jerome.dup...@bnf.fr
 ---

 Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet
 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez
 à l'environnement.

Re: Fwd: indexing only selected fields


Update Request Processors to the rescue!

Example - Ignore input values for any undefined fields

Add to solrconfig:

 updateRequestProcessorChain name=ignore-undefined
   processor class=solr.IgnoreFieldUpdateProcessorFactory /
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain

Index content:

 curl 
http://localhost:8983/solr/update?commit=trueupdate.chain=ignore-undefined; 
\

 -H 'Content-type:application/json' -d '
 [{id: doc-1,
   title: Hello World,
   features: [Fast, Cheap],
   bad_field_name: Junk,
   abstract: Not in schema either}]'

Results:

 id:doc-1,
 title:[Hello World],
 features:[Fast,
   Cheap],

(From the book!)

-- Jack Krupansky

-Original Message- 
From: Igor Littig

Sent: Thursday, May 30, 2013 10:39 AM
To: solr-user@lucene.apache.org
Subject: Fwd: indexing only selected fields

-- Forwarded message --
From: Igor Littig igor.lit...@gmail.com
Date: 2013/5/30
Subject: indexing only selected fields
To: solr-user-...@lucene.apache.org


Hello everyone.

I'm quite new in Solr and need your advice... Does anybody know how to
index not all fields in an uploading document but only those which I
mentioned in the schema, others fields and symbols just ignore. Is it
possible ???

Re: indexing only selected fields

If you want to just removing anything that does not match then
'ignored' field type in example schema would work. If you want to
ignore specific fields but complain on any unexpected things you can
still use specific fields but with ignored type.

Or you could use Update Request Processors like this one:
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html

Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)

On Thu, May 30, 2013 at 10:55 AM, Igor Littig igor.lit...@gmail.com wrote:
Alex

Thank you for the answer. I am submitting by POST method via curl... For
example when I want to submit a document I'm typing in the command line:

curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @
base.info -H 'Content-type:application/json'

where base.info my file with information which I want to index. Could you
in which ways(methods) I can automatically omit unknown fields. It would be
easier to select only needed fields.

Cheers
Igor

2013/5/30 Alexandre Rafalovitch arafa...@gmail.com

How are you submitting your document? Some methods automatically
ignore unknown fields, other complaint.

In any case, there is always a way to define an ignored field type.
The schema.xml in the main example shows how to do it. Search for
'ignored'. But beware that this will hide all spelling and other
errors later..

On Thu, May 30, 2013 at 10:39 AM, Igor Littig igor.lit...@gmail.com
wrote:
-- Forwarded message --
From: Igor Littig igor.lit...@gmail.com
Date: 2013/5/30
Subject: indexing only selected fields
To: solr-user-...@lucene.apache.org

Hello everyone.

I'm quite new in Solr and need your advice... Does anybody know how to
index not all fields in an uploading document but only those which I
mentioned in the schema, others fields and symbols just ignore. Is it
possible ???

Re: Solr 4.3, Tomcat, Error filterStart

Thanks! I guess I should have asked on-list BEFORE wasting 4 hours 
fighting with it myself, but I was trying to be a good user and do my 
homework!  Oh well.


Off to the logging instructions, hope I can figure them out -- if you 
could update the tomcat instructions with the simplest possible way to 
get deploy in Tomcat to work, that'd def be helpful!


On 5/30/2013 10:41 AM, Shawn Heisey wrote:

I am trying to get Solr installed in Tomcat, and having trouble.




When I start up tomcat, I get in the Tomcat log:


INFO: Deploying web application archive solr.war
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors




I've tried to google for advice on this -- mostly what I found was
suggestions for how to turn up logging to get more info


In a cruel twist of fate, it is actually logging changes that are
preventing Solr from starting. The required steps for deploying 4.3
changed. I will update the wiki page about tomcat when I'm not on a train.
  See this page for additional instructions, specifically the section about
deploying on containers other than jetty:

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

Re: SPLITSHARD: time out error

2013-05-30 Thread Shalin Shekhar Mangar

Shard splitting is buggy in 4.3. I recommend that you wait for the next
release (4.3.1) before using this feature.

That being said, the split is executed by the Overseer and will continue to
happen even after the http request times out. There aren't enough hooks to
monitor the progress of the operation. You can look at ZooKeeper
clusterstate to see if the sub shards are up and running. In your case, the
sub shards will be called 00_0 and 00_1 and should be in active state
(both shardState and state attribute in zk should be active).


On Thu, May 30, 2013 at 4:46 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 I have a time out error when I try to split a collection with 15M documents

 The exception (solr version 4.3):

 542468 [catalina-exec-27] INFO  org.apache.solr.servlet.SolrDispatchFilter
 – [admin] webapp=null path=/admin/collections
 params={shard=00action=SPLITSHARDcollection=ST-0112_replicated}
 status=500
 QTime=300028
 542469 [catalina-exec-27] ERROR org.apache.solr.servlet.SolrDispatchFilter
 – null:org.apache.solr.common.SolrException: splitshard the collection time
 out:300s
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at

 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at

 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
 at

 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at

 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 582557 [catalina-exec-39] INFO  org.apache.solr.update.SolrIndexSplitter  –
 SolrIndexSplitter: partition #1
 582561 [catalina-exec-39] INFO  org.apache.solr.core.SolrCore  –
 SolrDeletionPolicy.onInit: commits:num=1


 commit{dir=/disk2/node00.solrcloud/solr/home/0112_replicated_00_1_replica1/data/index,segFN=segments_1,generation=1,filenames=[segments_1]
 582563 [catalina-exec-39] INFO  org.apache.solr.core.SolrCore  – newest
 commit = 1[segments_1]

 How I can split my collection without this error?



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SPLITSHARD-time-out-error-tp4066991.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.3, Tomcat, Error filterStart

I'm going to add a note to http://wiki.apache.org/solr/SolrLogging , 
with the Tomcat sample Error filterStart error, as an example of 
something you might see if you have not set up logging.


Then at least in the future, googling solr tomcat error filterStart 
might lead someone to the clue that it might be logging.



On 5/30/2013 10:41 AM, Shawn Heisey wrote:

I am trying to get Solr installed in Tomcat, and having trouble.




When I start up tomcat, I get in the Tomcat log:


INFO: Deploying web application archive solr.war
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 29, 2013 3:59:40 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors




I've tried to google for advice on this -- mostly what I found was
suggestions for how to turn up logging to get more info


In a cruel twist of fate, it is actually logging changes that are
preventing Solr from starting. The required steps for deploying 4.3
changed. I will update the wiki page about tomcat when I'm not on a train.
  See this page for additional instructions, specifically the section about
deploying on containers other than jetty:

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

Re: Solr 4.3, Tomcat, Error filterStart


On 5/30/2013 9:26 AM, Jonathan Rochkind wrote:
Thanks! I guess I should have asked on-list BEFORE wasting 4 hours 
fighting with it myself, but I was trying to be a good user and do my 
homework!  Oh well.


Off to the logging instructions, hope I can figure them out -- if you 
could update the tomcat instructions with the simplest possible way to 
get deploy in Tomcat to work, that'd def be helpful!


Commute done.

I'm not a tomcat user, so the only thing I know about where to drop 
those jars and properties file is tomcat/lib ... do you have anything 
more specific that I can include in the wiki page?  In particular, I'd 
like to know if there are any particular config files or other specific 
information I can list to help the reader locate where tomcat/lib 
lives.  I suppose I can put what I do know and let someone with better 
knowledge update it.


Thanks,
Shawn

Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-30 Thread jerome . dupont


Hi,

Thanks for your anwser, it made me go ahead.
The name of the entity was not good, not consistent with schema
Now the first entity works fine: the query is done to the database and
returns the good result.
The problem is that the second entity, which is a XPathEntityProcessor
entity, doesn't read the file specified in url attribute, but tries to
execute it as an sql query on my database.

I tried to put a fake query (select 1 from dual) but it changes nothing.
It's like the XPathEntityProcessor entity behaved like an
SqlEntityProcessor, using url attribute instead of query attrbute.

I've forgotten to say which version I use: SOLR 4.2.1 (can be changed, it's
just the beginning of the developpement)
See next the config, and the return message:


The verbose output:

  verbose-output:[
entity:noticebib,[
  query,select DISTINCT   SUBSTR( to_char(noticebib.numnoticebib,
'9'), 3, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib,
'9'), 4, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib,
'9'), 5, 3) || '/' ||to_char(noticebib.numnoticebib) || '.xml'
as CHEMINRELATIF   from bnf.noticebibwhere numnoticebib = '3001',
  time-taken,0:0:0.141,
  null,--- row #1-,
  CHEMINRELATIF,3/0/000/3001.xml,
  null,-,
  entity:processorDocument,[
document#1,[
  query,file:///D:/jed/noticesbib/3/0/000/3001.xml,

EXCEPTION,org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: file:///D:/jed/noticesbib/3/0/000/3001.xml
Processing Document # 1\r\n\tat
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow
(DataImportHandlerException.java:71)\r\n\tat ...
oracle.jdbc.driver.OracleStatementWrapper.execute
(OracleStatementWrapper.java:1203)\r\n\tat
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init
(JdbcDataSource.java:246)\r\n\t... 32 more\r\n,
  time-taken,0:0:0.124,


This is the configuration


dataSource name=accesPCN
my oracle ds definition.../

dataSource name=racineNoticeDatasource
baseUrl=file:///D:/jed/noticesBib type=URLDataSource encoding=UTF-8/

document


entity name=noticebib
datasource=accesPCN
processor=SqlEntityProcessor
rootEntity=false
query=select DISTINCT
SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) ||
'/' ||
SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) ||
'/' ||
SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) ||
'/' ||
to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
from bnf.noticebib
where numnoticebib = '3001'

field column=CHEMINRELATIF
name=CHEMINRELATIF /

entity  
name=processorDocument
processor=XPathEntityProcessor

datasource=racineNoticeDatasource
url=file:///D:/jed/noticesbib/$
{noticebib.CHEMINRELATIF}
query=SELECT 1 from dual
xsl=xslt/mnb/IXM_MNb.xsl
forEach=/record

transformer=LogTransformer,fr.bnf.solr.BnfDateTransformer
logTemplate=Notice fichier: $
{noticebib.CHEMINRELATIF} logLevel=debug




Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement.

Pivot Facets refining datetime, bleh

2013-05-30 Thread Andrew Muldowney

I've been trying to get into how distributed field facets do their work but
I haven't been able to uncover how they deal with this issue.

Currently distrib pivot facets does a getTermCounts(first_field) to
populate a list at the level its working on.

When putting together the data structure we set up a BytesRef, fill it in
with the value using the FieldType.ReadableToIndexed call and then add the
FieldType.ToObject of that bytesRef and associated field.
--From getTermCounts comes fieldValue--
  termval = new BytesRef();
 ftype.readableToIndexed(fieldValue, termval);
pivot.add( value, ftype.toObject(sfield, termval) );


This works great for everything but datetime, as datetime's .ToObject turns
it into a human readable string that is unconvertable -at least in my
investigation.

I've tried to use the FieldType.ToInternal but that also fails on the human
readable datetime format.

My original idea was to skip the aformentioned block of code and just
straight add the fieldValue to the data structure. This caused some pivot
facet tests to return wonky results, I'm not sure if I should go down the
path of trying to figure out those problems or if there is a different
approach I should be taking.

Any general guidance on how distributed field facets deals with this would
be much appreciated.

Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

On Thu, May 30, 2013 at 11:44 AM,  jerome.dup...@bnf.fr wrote:
 entity  name=processorDocument
 
 processor=XPathEntityProcessor
 
 datasource=racineNoticeDatasource
 
 url=file:///D:/jed/noticesbib/$
 {noticebib.CHEMINRELATIF}

I've seen this one before. 'dataSource' is case sensitive, you said
'datasource'. DIH does not complain but instead just picks up the
default (first?) processor which happens to be SQL one. Change one
letter, see if it fixes it.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: indexing only selected fields

Ok, that is clear. Thanks fo the answer

2013/5/30 Alexandre Rafalovitch arafa...@gmail.com

Or you could use Update Request Processors like this one:

http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html

On Thu, May 30, 2013 at 10:55 AM, Igor Littig igor.lit...@gmail.com
wrote:
Alex

Thank you for the answer. I am submitting by POST method via curl... For
example when I want to submit a document I'm typing in the command line:

curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary
@
base.info -H 'Content-type:application/json'

where base.info my file with information which I want to index. Could
you
in which ways(methods) I can automatically omit unknown fields. It would
be
easier to select only needed fields.

Cheers
Igor

2013/5/30 Alexandre Rafalovitch arafa...@gmail.com

How are you submitting your document? Some methods automatically
ignore unknown fields, other complaint.

Hello everyone.

Rollback from Solr4.2.1 to Solr3.5

2013-05-30 Thread adityab

Hi,
We recently had production release to upgrade our Solr3.5 to Solr 4.2.1. (No
schema change except the some basic required for 4.2.1) 

The nature of our document is that we have huge multivalued fields. they can
go from 1000 to 100K in once single field.
# Documents : 300K
# Index size: 9GB (all fields are stored and 5 are index)
# JVM Heap: 4GB

We haven't seen more than 10% CPU and 60% JVM heap during our usage where we
get 7K to 10K request per min for this server. 

Upgrading to 4.2.1 we saw the CPU spiked to constant 75% and heap usage grew
to 95% within 5 mins of traffic. Later the server becomes slow to
unresponsive and we start seeing connection timeouts

Did couple of adjustments to JVM heap but still couldn't get it resolved.
and had to rollback to 3.5 as we were exceeding out deployment window.
During our investigation we identified that the queries which are causing
the problem is the one which is fetching the huge multivalued fields.
Decompressing is killing the server. 
I have reported this issue earlier which happened to be fixed in 4.2.1 but
not sure if there is another side effect of compressed field that still
remains. 

Any advice is much appreciated. 

thanks
Aditya 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rollback-from-Solr4-2-1-to-Solr3-5-tp4067094.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: java.lang.IllegalAccessError when invoking protected method from another class in the same package path but different jar.

2013-05-30 Thread bbarani

Hoss, thanks a lot for the explanation.

We override most of the methods of query
component(prepare,handleResponses,finishStage etc..)  to incorporate custom
logic and we set the _responseDocs values based on custom logic (after
filtering out few data) and then we call the parent(super) method(query
component) with the modified responsedocs. Thats the main reason we are
using the _responsedocs variable as is..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-IllegalAccessError-when-invoking-protected-method-from-another-class-in-the-same-package-p-tp4066904p4067086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.3: write.lock is not removed

2013-05-30 Thread bbarani

How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.

Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Iain Lopata

I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using
Nutch's solrindex to index documents into Solr.

 

When indexing documents, I hit an occasional document that does not match
the Solr schema.  For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).  Ideally, I would like this document to be skipped, an
error written to the log file for later investigation, and the indexing of
the remainder of the parsed documents to continue.  Instead the job fails.

 

I have tried setting
abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC
onfigurationError in solrconfig.xml and restarting tomcat, but that does
not seem to make a difference.

 

Where else should I be looking?

solr 3.6 use only one CPU

2013-05-30 Thread Mingfeng Yang

We have a solr instance running on a 4 CPU box.

Sometimes, we send a query to our solr server and it take up 100% of one
CPU and  60% of memory.   I assume that if we send another query request,
solr should be able to use another idling CPU.  However, it is not the
case.  Using top, I only see one cpu is busy, and the client side just gets
stucked.

Is solr 3.6 able to do multithreading to process requests?

Ming-

Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Iain Lopata

I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using
Nutch's solrindex to index documents into Solr.

 

When indexing documents, I hit an occasional document that does not match
the Solr schema.  For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).  Ideally, I would like this document to be skipped, an
error written to the log file for later investigation, and the indexing of
the remainder of the parsed documents to continue.  Instead the job fails.

 

I have tried setting
abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC
onfigurationError in solrconfig.xml and restarting tomcat, but that does
not seem to make a difference.

 

Where else should I be looking?

Re: Continue Indexing Documents when single doc does not match schema


On 5/30/2013 11:03 AM, Iain Lopata wrote:

When indexing documents, I hit an occasional document that does not match
the Solr schema.  For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).  Ideally, I would like this document to be skipped, an
error written to the log file for later investigation, and the indexing of
the remainder of the parsed documents to continue.  Instead the job fails.

I have tried setting
abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnC
onfigurationError in solrconfig.xml and restarting tomcat, but that does
not seem to make a difference.


That config option just tells Solr whether or not initial startup should 
fail if there's a configuration error in config files like 
solrconfig.xml.  In most cases, you want it to be true.


I don't think anything currently exists to do what you want.  The 
feature request issue has been around for a long time, and it's had some 
relatively recent activity, at least compared to its creation date:


https://issues.apache.org/jira/browse/SOLR-445

I haven't looked at the patch, but I would imagine that it just needs to 
be updated for the many source code changes since it was created, then 
examined to make sure it's correctly implemented.


Thanks,
Shawn

Re: solr 3.6 use only one CPU


On 5/30/2013 11:12 AM, Mingfeng Yang wrote:

We have a solr instance running on a 4 CPU box.

Sometimes, we send a query to our solr server and it take up 100% of one
CPU and  60% of memory.   I assume that if we send another query request,
solr should be able to use another idling CPU.  However, it is not the
case.  Using top, I only see one cpu is busy, and the client side just gets
stucked.

Is solr 3.6 able to do multithreading to process requests?


Solr is completely multithreaded, and has been for as long as I've been 
using it, which started with version 1.4.0.  If you only send it one 
request at a time, it will only use one CPU.  Your client code must be 
multithreaded as well.


I don't have enough information to tell you whether your server is sized 
appropriately for your index.  Here's some general information:


http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Find rows within range of other rows

2013-05-30 Thread Mike Ree

I need to do a query where I need to find all people who have done 2 events
within a range. I currently log one row per an event.

Example:
Person,Date,ViewedUrl
1,2012May10,google.com
2,2012May10,yahoo.com
1,2012May13,yahoo.com
2,2012May13,google.com

Sample request would be wanting to find all people who viewed
yahoo.comwithin a week of viewing
google.com, so I would want to return 1 group of values for person 1.

Any idea's?

Thanks,
Mike

Re: Continue Indexing Documents when single doc does not match schema

On Thu, May 30, 2013 at 1:03 PM, Iain Lopata ilopa...@hotmail.com wrote:
 For example, a document which has two address fields when
 my Solr schema.xml does not specify address as being multi-valued (and I do
 not want it to be).


No help on the core topic, but a workaround for the specific situation
could be: 
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

RE: solr 4.3: write.lock is not removed

Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: solr 4.3: write.lock is not removed

I did more tests and get more info: the basic setting is that we created core 
from PHP CURl
API where we define:

schema
config
instanceDir=my_solr_home
dataDir=my_solr_home/data/new_collection_name

In solr 3.6.1 we donot need to define schema/config because conf folder is not 
inside each
collection. 

1/ Indexing works OK but write.lock is not removed (we use 
/update?commit=true..)
2/ Shutdown tomcat, I saw write.lock is gone
3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with 
some warning
   messages. It seems that in solr.xml, dataDir is not defined?

Thanks very much for helps, Lisheng




-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, May 30, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: solr 4.3: write.lock is not removed


Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple field join?

2013-05-30 Thread Chris Hostetter


: My advice. Forget joins and try to write this in pure
: Solr query language. The more you try to use Solr like
: a database, the more you'll get into trouble. De-normalize
: your data and try again.

with that important caveat in mind, it is worth noting that what you are 
essentailly asking about is using multiple filters each containing a 
distinct join query...

:  outer_id IN (SELECT inner_id FROM collection1 where zzz = vvv)
:  and
:  outer_id2 IN (SELECT inner_id2 FROM collection1 where ttt = xxx)
:  and
:  outer_id3 IN (SELECT inner_id3 FROM collection1 where ppp = rrr)

?q=*:*
fq={!join from=inner_id to=outer_id}zzz:vvv 
fq={!join from=inner_id2 to=outer_id2}ttt:xxx 
fq={!join from=inner_id3 to=outer_id3}ppp:rrr 


-Hoss

Re: solr 4.3: write.lock is not removed

2013-05-30 Thread Chris Hostetter


: I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
: indexing 
:  
: write.lock
:  
: is NOT removed. Later if I index again it still works OK. Only after I 
shutdown Tomcat 
: then write.lock is removed. This behavior caused some problem like I could 
not use luke
: to observe indexed data.

IIRC, This was an intentional change.  In older versions of Solr the 
IndexWRiter was only opened if/when updates needed to be made, but that 
made it impossible to safely take advantage of some internal optimizations 
related to NRT IndexReader reloading, so the logic was modified to always 
keep the IndexWriter open as lon as the SolrCore is loaded.

In general, your past behavior of pointing luke at a live solr index could 
have also produced problems if updates came into solr while luke had the 
write lock active.


-Hoss

Re: Grouping results based on the field which matched the query

2013-05-30 Thread Chris Hostetter


: I wanted to know if Solr has some functionality to group results based on
: the field that matched the query.
: 
: So if I have id, name and manufacturer in my document structure, I want to
: know how many results are there because its manufacturer matched the q and
: how many results are there because q matched the name field.

there's a difference between *grouping* results by a query, and *counting* 
which sbset of your request match your query.

in generla, it sounds like you are probably currently using something like 
dismax or edismax to search across multiple fields, ala...

  ? defType=dismax  qf=name manufacturer  q=user input

if you want to count how many of each of those docs match the user input 
in either name or manuacturer, you can use facet.query and take 
advantage of local params to refer back to the users main query input...

   facet=true 
   facet.query={!field f=manufacturer v=$q}
   facet.query={!field f=name v=$q}

...however i'ts important to note that those counts won't 
neccessary add upto your numFound because some docs may match on 
multiple fields ... you may also not get any counts if your main query 
string is something complex, in qhich case you may want to ignore the 
local param (v=$q) and explicitly specify what the various facet.queries 
are.

likewise, if you truely want to *group* the results based on querying on a 
specific field, you can use group.query instead...

https://wiki.apache.org/solr/FieldCollapsing

-Hoss

Collections API Reload killing my cloud

2013-05-30 Thread davers

Everytime I try to do a reload using the collections API my entire cloud goes
down and I cannot search it. The solrconfig.xml and schema.xml are good
because when I just restart tomcat everything works fine. 

Here is the output of the collections api reload command:

59155087 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Overseer
Collection Processor: Get the message
id:/overseer/collection-queue-work/qn-00 message:{
  operation:reloadcollection,
  name:productindex}
59155098 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Executing
Collection Cmd : action=RELOAD
59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-1:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-4:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155100 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-2:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155102 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-5:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155103 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-3:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155105 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
sending CoreAdmin cmd to solr-shard-6:8080/solr
params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
59155108 [http-bio-8080-exec-7] INFO  org.apache.solr.core.CoreContainer  –
Reloading SolrCore 'productindex' using instanceDir: /srv/solr/productindex
59155109 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
Check for collection zkNode:productindex
59155111 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
Collection zkNode exists
59155112 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
Load collection config from:/collections/productindex
59155114 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for
directory: '/srv/solr/productindex/'
59155166 [http-bio-8080-exec-7] INFO  org.apache.solr.core.SolrConfig  –
Adding specified lib dirs to ClassLoader
59155167 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader
59155168 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/metadata-extractor-2.6.2.jar' to
classloader
59155168 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to
classloader
59155168 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/vorbis-java-core-0.1.jar' to
classloader
59155168 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to
classloader
59155169 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/poi-3.8.jar' to classloader
59155169 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/rome-0.9.jar' to classloader
59155169 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/jdom-1.0.jar' to classloader
59155170 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/poi-ooxml-schemas-3.8.jar' to
classloader
59155170 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/commons-compress-1.4.1.jar' to
classloader
59155170 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar' to
classloader
59155170 [http-bio-8080-exec-7] INFO 
org.apache.solr.core.SolrResourceLoader  – Adding
'file:/srv/solr/contrib/extraction/lib/icu4j-49.1.jar' to

Re: Solr 4.3, Tomcat, Error filterStart


Okay, sadly, i still can't get this to work.

Following the instructions at:
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty

I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied 
solr/example/resources/log4j.properties there too.


The result is unchanged, when I start tomcat, it still says:

May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors


This is very frustrating. I have no way to even be sure this problem 
really is logging related, although it seems likely. But I feel like I'm 
just randomly moving chairs around and hoping the error will go away, 
and it does not.


Is there anyone that has succesfully run Solr 4.3.0 in a Tomcat 6? Can 
we even confirm this is possible?  Can anyone give me any other hints, 
especially does anyone have any idea how to get some more logging out of 
Tomcat, then the fairly useless Error filterSTart?


The only reason I'm using tomcat is that we always have in our current 
Solr 1.4-based application, for reasons lost to time. I was hoping to 
upgrade to Solr 4.3, without simultaneously switching our infrastructure 
from tomcat to jetty, change one thing at a time. I suppose I might need 
to abandon that and switch to jetty too, but I'd rather not.

Re: Collections API Reload killing my cloud

2013-05-30 Thread Mark Miller

https://issues.apache.org/jira/browse/SOLR-4805

- Mark

On May 30, 2013, at 3:09 PM, davers dboych...@improvementdirect.com wrote:

 Everytime I try to do a reload using the collections API my entire cloud goes
 down and I cannot search it. The solrconfig.xml and schema.xml are good
 because when I just restart tomcat everything works fine. 
 
 Here is the output of the collections api reload command:
 
 59155087 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Overseer
 Collection Processor: Get the message
 id:/overseer/collection-queue-work/qn-00 message:{
  operation:reloadcollection,
  name:productindex}
 59155098 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Executing
 Collection Cmd : action=RELOAD
 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-1:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155099 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-4:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155100 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-2:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155102 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-5:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155103 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-3:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155105 [Overseer-89776537554780160-solr-shard-4:8080_solr-n_00]
 INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Collection Admin
 sending CoreAdmin cmd to solr-shard-6:8080/solr
 params:action=RELOADcore=productindexqt=%2Fadmin%2Fcores
 59155108 [http-bio-8080-exec-7] INFO  org.apache.solr.core.CoreContainer  –
 Reloading SolrCore 'productindex' using instanceDir: /srv/solr/productindex
 59155109 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
 Check for collection zkNode:productindex
 59155111 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
 Collection zkNode exists
 59155112 [http-bio-8080-exec-7] INFO  org.apache.solr.cloud.ZkController  –
 Load collection config from:/collections/productindex
 59155114 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for
 directory: '/srv/solr/productindex/'
 59155166 [http-bio-8080-exec-7] INFO  org.apache.solr.core.SolrConfig  –
 Adding specified lib dirs to ClassLoader
 59155167 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader
 59155168 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/metadata-extractor-2.6.2.jar' to
 classloader
 59155168 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to
 classloader
 59155168 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/vorbis-java-core-0.1.jar' to
 classloader
 59155168 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to
 classloader
 59155169 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/poi-3.8.jar' to classloader
 59155169 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/rome-0.9.jar' to classloader
 59155169 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/jdom-1.0.jar' to classloader
 59155170 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/poi-ooxml-schemas-3.8.jar' to
 classloader
 59155170 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding
 'file:/srv/solr/contrib/extraction/lib/commons-compress-1.4.1.jar' to
 classloader
 59155170 [http-bio-8080-exec-7] INFO 
 org.apache.solr.core.SolrResourceLoader  – Adding

Re: Collections API Reload killing my cloud

2013-05-30 Thread davers

Is it possible that this has something do do with it?

59157032 [Thread-2] INFO  org.apache.solr.cloud.Overseer  – Update state
numShards=null message={

numShards=null



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collections-API-Reload-killing-my-cloud-tp4067141p4067151.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.3, Tomcat, Error filterStart


On 5/30/2013 1:19 PM, Jonathan Rochkind wrote:

Okay, sadly, i still can't get this to work.

Following the instructions at:
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty


I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied
solr/example/resources/log4j.properties there too.

The result is unchanged, when I start tomcat, it still says:


OK, at this point, you've got Solr's logging configured, but your tomcat 
log won't be used -- the default logging destination has changed to log4j.


You might need to edit the log4j.properties file so that it points at a 
location that exists - the default is logs/solr.log, relative to the 
current working directory of the tomcat process.


Once the log4j destination gets created properly, you can look there for 
Solr's logs, which will hopefully give you additional insight.


If you want it to work with tomcat exactly how it did before, then you 
can go back to the old logging method (java.util.logging) with another 
section on that page:


http://wiki.apache.org/solr/SolrLogging#Switching_from_Log4J_back_to_JUL_.28java.util.logging.29

Thanks,
Shawn

SolrCloud running away with resources

2013-05-30 Thread ltenny

I've set up a simple 10 node, 5 shard SolrCloud 4.3. I'm pushing just a few
thousand documents into it.  What I'm doing is rather write intensive
100x...more writes than reads.  I've noticed that there seems to be an
unbounded use of resources.  I'm seeing a steadily increasing number of
network connections (monitored via: netstat | wc -l, which return over 5,500
and growing about 50 per minute) and over 2,200 open file descriptors (as
shown on the Solr dashboard).  This seems like there is something not
configured correctly.  At some point, rather soon I'm afraid, I'll run out
of resources.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-running-away-with-resources-tp4067154.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.3, Tomcat, Error filterStart

Okay, for posterity: I did manage to get it working. It WAS lack of the
logging files.

First, the only way I could manage to get Tomcat6 to log an actual
stacktrace for the Error filterStart was to _delete_ my
CATALINA_HOME/conf/logging.properties file. Apparently without this
file at all, the default ends up being 'log everything'.

And once that happened, it did confirm that the Error filterStart
problem WAS an inability to find the logging jars. (And the stack trace
was an exception from Solr with a nice message including the URL to the
logging wiki page, nice one solr). Nothing I tried before in a fit of
desperation deleting that file entirely worked to get the stack trace
logged.

Once confirmed that the problem really was not finding the logging jars,
I could keep doing things and restarting and seeing if that was still
the exception.

And I found that for some reason, despite
http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html
suggesting that jars could be found in either CATALINA_BASE/lib (for me
/opt/tomcat6/lib), OR CATALINA_BASE/lib (for me /usr/share/tomcat6/lib),
in fact for whatever reason /opt/tomcat6/lib was being ignored, but
/usr/share/tomcat6/lib worked.

And now I succesfully have solr started in tomcat.

I realize that these are all tomcat6 issues, not solr issues. But others
trying to get solr started may have similar problems. Appreciate the tip
that the Error filterStart was probably related to new solr 4.3.0
logging setup, which ended up confirmed.

Jonathan

On 5/30/2013 3:19 PM, Jonathan Rochkind wrote:

Okay, sadly, i still can't get this to work.

Following the instructions at:
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty

I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied
solr/example/resources/log4j.properties there too.

The result is unchanged, when I start tomcat, it still says:

May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
May 30, 2013 3:15:00 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors

This is very frustrating. I have no way to even be sure this problem
really is logging related, although it seems likely. But I feel like I'm
just randomly moving chairs around and hoping the error will go away,
and it does not.

Is there anyone that has succesfully run Solr 4.3.0 in a Tomcat 6? Can
we even confirm this is possible? Can anyone give me any other hints,
especially does anyone have any idea how to get some more logging out of
Tomcat, then the fairly useless Error filterSTart?

The only reason I'm using tomcat is that we always have in our current
Solr 1.4-based application, for reasons lost to time. I was hoping to
upgrade to Solr 4.3, without simultaneously switching our infrastructure
from tomcat to jetty, change one thing at a time. I suppose I might need
to abandon that and switch to jetty too, but I'd rather not.

indexing documents

Good day everyone.

I recently faced another problem. I've got a bunch of documents to index.
The problem, that they in the same time database for another application.
These documents stored in JSON format in the following scheme:

  {
  id: 10,
  name: dad 177,
  cat:[{
  id:254,
  name:124
  }]
}

When I'm trying to post them, I get the following error:

ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: Unknown command: id [8]

Is there a way to index these documents without changing  ? How can i
modify the schema or I need to do something else ?

Re: 2 VM setup for SOLRCLOUD?

2013-05-30 Thread Jason Hellman

Jamey,

You will need a load balancer on the front end to direct traffic into one of 
your SolrCore entry points.  It doesn't matter, technically, which one though 
you will find benefits to narrowing traffic to fewer (for purposes of better 
cache management).

Internally SolrCloud will round-robin distribute requests to other shards once 
a query begins execution.  But you do need an entry point externally to be 
defined through your load balancer.

Hope this is useful!

Jason

On May 30, 2013, at 12:48 PM, James Dulin jdu...@crelate.com wrote:

 Working to setup SolrCloud in Windows Azure.  I have read over the solr Cloud 
 wiki, but am a little confused about some of the deployment options.  I am 
 attaching an image for what I am thinking we want to do.  2 VM’s that will 
 have 2 shards spanning across them.  4 Nodes total across the two machines, 
 and a zookeeper on each VM.  I think this is feasible, but, I am a little 
 confused about how each node knows how to respond to requests (do I need a 
 load balancer in front, or can we just reference the “collection” etc.)
  
 
  
 Thanks!
  
 Jamey

2 VM setup for SOLRCLOUD?

2013-05-30 Thread James Dulin

Working to setup SolrCloud in Windows Azure.  I have read over the solr Cloud 
wiki, but am a little confused about some of the deployment options.  I am 
attaching an image for what I am thinking we want to do.  2 VM's that will have 
2 shards spanning across them.  4 Nodes total across the two machines, and a 
zookeeper on each VM.  I think this is feasible, but, I am a little confused 
about how each node knows how to respond to requests (do I need a load balancer 
in front, or can we just reference the collection etc.)

[cid:image001.png@01CE5D4B.D617D6E0]

Thanks!

Jamey

RE: solr 4.3: write.lock is not removed

Hi,

Thanks very much for the explanation! Could we config to get to old behavior?
I asked this option because our app has many small cores so that we prefer 
create/close writer on the fly (otherwise we may have memory issue quickly).

We also do not need NRT for now.

Thanks very much for helps, Lisheng

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Thursday, May 30, 2013 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed



: I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
: indexing 
:  
: write.lock
:  
: is NOT removed. Later if I index again it still works OK. Only after I 
shutdown Tomcat 
: then write.lock is removed. This behavior caused some problem like I could 
not use luke
: to observe indexed data.

IIRC, This was an intentional change.  In older versions of Solr the 
IndexWRiter was only opened if/when updates needed to be made, but that 
made it impossible to safely take advantage of some internal optimizations 
related to NRT IndexReader reloading, so the logic was modified to always 
keep the IndexWriter open as lon as the SolrCore is loaded.

In general, your past behavior of pointing luke at a live solr index could 
have also produced problems if updates came into solr while luke had the 
write lock active.


-Hoss

RE: solr starting time takes too long

Hi Eric,

Thanks very much for helps (I should have responded sooner):

1/ My problem in 3.6 turned out to be much related to the fact I did not share 
schema,
   after using shareSchema, the start time is reduced up to 80% (to my great 
surprise,
   previously I thought burden is most in solrconfig).
   
2/ I just upgraded to solr 4.3, but somehow I did not see all the fixes 
mentioned in
   the WIKI (like shareConfig), I saw the resolution is Won't fix, do you 
have plan
   to put the fix into next release?

Thanks and best regards, Lisheng

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, May 22, 2013 4:57 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long


Zhang:

In 3.6, there's really no choice except to load all the cores on
startup. 10 minutes still seems excessive, do you perhaps have a
heavy-weight firstSearcher query?

Yes, soft commits are 4.x only, so that's not your problem.

There's a shareSchema option that tries to only load 1 copy of the
schema that should help, but that doesn't help with loading
solrconfig.xml.

Also in the 4.3+ world there's the option to lazily-load cores, see:
http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not
an option, but I thought I'd mention it.

But I'm afraid you're stuck. You might be able to run bigger hardware
(perhaps you're memory-starved). Other than that, you may need to use
more than one machine to get fast enough startup times.

Best,
Erick

On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng
lisheng.zh...@broadvision.com wrote:
 Thanks very much for quick helps! I searched but it seems that
 autoSoftCommit is solr 4x feature and we are still using 3.6.1?

 Best regards, Lisheng

 -Original Message-
 From: Carlos Bonilla [mailto:carlosbonill...@gmail.com]
 Sent: Wednesday, May 22, 2013 12:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solr starting time takes too long


 Hi Lisheng,
 I had the same problem when I enabled the autoSoftCommit in
 solrconfig.xml. If you have it enabled, disabling it could fix your problem,

 Cheers.
 Carlos.


 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com


 Hi,

 We are using solr 3.6.1, our application has many cores (more than 1K),
 the problem is that solr starting took a long time (10m). Examing log
 file and code we found that for each core we loaded many resources, but
 in our app, we are sure we are always using the same solrconfig.xml and
 schema.xml for all cores. While we can config schema.xml to be shared,
 we cannot share SolrConfig object. But looking inside SolrConfig code,
 we donot use any of the cache.

 Could we somehow change config (or source code) to share resource between
 cores to reduce solr starting time?

 Thanks very much for helps, Lisheng

Re: OPENNLP problems

2013-05-30 Thread Lance Norskog


I will look at these problems. Thanks for trying it out!

Lance Norskog

On 05/28/2013 10:08 PM, Patrick Mi wrote:

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==
fieldType name=text_opennlp_nvf class=solr.TextField
positionIncrementGap=100
   analyzer
 tokenizer class=solr.OpenNLPTokenizerFactory
tokenizerModel=opennlp/en-token.bin/
 filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/
 filter class=solr.FilterPayloadsFilterFactory
payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/
 filter class=solr.StripPayloadsFilterFactory/
   /analyzer
 /fieldType
==

Struggled to get that going until I put the extra parameter
keepPayloads=true in as below.
  filter class=solr.FilterPayloadsFilterFactory keepPayloads=true
payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/

Question: am I doing the right thing? Is this a mistake on wiki

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.

add
doc
   field name=id1/field
   field name=text_opennlp_nvfcheck in the hotel/field/doc
/add

However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel' were
showing in the facet result.)
  
add

doc
   field name=id1/field
   field name=text_opennlp_nvfcheck in the hotel/field
/doc
doc
   field name=id2/field
   field name=text_opennlp_nvfremoves the payloads/field
/doc
doc
   field name=id3/field
   field name=text_opennlp_nvfretains only nouns and verbs /field
/doc
/add

Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

RE: solr 4.3: write.lock is not removed

I did more test and it seems that this is still a bug (previous issue 3/):

1/ Create a core by CURL command with dataDir=some_folder, core is created OK
   and later indexing worked OK also.

2/ But in solr.xml, dadaDir is not defined in element core 

3/ After restart solr, dataDir information is lost and solr issued WARN.

4/ If I manually add dataDir attribute into core element in solr.xml after core
   is created, restarting solr will be fine.

Thanks very much for helps, Lisheng


-Original Message-
From: Zhang, Lisheng 
Sent: Thursday, May 30, 2013 11:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: solr 4.3: write.lock is not removed


I did more tests and get more info: the basic setting is that we created core 
from PHP CURl
API where we define:

schema
config
instanceDir=my_solr_home
dataDir=my_solr_home/data/new_collection_name

In solr 3.6.1 we donot need to define schema/config because conf folder is not 
inside each
collection. 

1/ Indexing works OK but write.lock is not removed (we use 
/update?commit=true..)
2/ Shutdown tomcat, I saw write.lock is gone
3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with 
some warning
   messages. It seems that in solr.xml, dataDir is not defined?

Thanks very much for helps, Lisheng




-Original Message-
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, May 30, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: solr 4.3: write.lock is not removed


Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.

Strip HTML Tags and Store

2013-05-30 Thread Kalyan Kuram

Hi AllI am trying to understand what gets stored when i configure a field 
indexed and stored for example i have this in my schema.xmlfield 
name=articleBody type=text_general indexed=true stored=true /and
fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

I was expecting that solr will index  store html strip content when i invoke 
query i get some thing like this str 
name=articleBodyxhtml:h1xhtml:bSouth African Miners Are Trapped by 
Debt/xhtml:b/xhtml:h1 xhtml:pxhtml:b▸ A surge in high-interest lending 
contributes to mine violence/xhtml:b/xhtml:p xhtml:pxhtml:b▸ At least 
one bank “may have reckless lending problems”/xhtml:b/xhtml:p xhtml:pIn 
2008, platinum miner James Ntseane borrowed 8,000 rand ($886) from 
xhtml:bAfrican Bank Investments/xhtml:b to pay for his grandmother's 
funeral. Soon after, he took out two more loans, totaling 10,000 rand, for a 
sofa and house extension. Four years later he owes at least 30,515 rand, 
according to text messages he gets from African Bank, South Africa's biggest 
provider of unsecured loans. Under a court-ordered payment plan, his employer 
garnishes about 13 percent of his monthly 12,600-rand salary for the lender. He 
doesn't know how much interest he's paying. “They are taking too much money,” 
says Ntseane, 41./xhtml:p xhtml:pNtseane is one of more than 9 million 
South Africans mired in debt. African Bank, xhtml:bBayport Financial 
Services, Capitec Bank Holdings/xhtml:b, and other firms have led a boom in 
unsecured lending, charging interest as high as 80 percent a year, as is 
allowed there. Last year a series of strikes led to at least 46 deaths, the 
country's worst mining violence since the end of apartheid. “One of the 
contributing factors to all of these strikes has been this surge in unsecured 
lending,” says Mike Schussler, chief economist at the research group a 
href=http://economists.co.za/;Economists.co.za/a, echoing an October 
statement by Trade and Industry Minister Rob Davies./xhtml:p xhtml:pThe 
value of consumer loans not backed by assets such as homes rose 39 percent in 
the year through September, to 140 billion rand, reports the National Credit 
Regulator. The loans made up 10 percent of consumer credit on Sept. 30, up from 
8 percent a year earlier. In November, South Africa's National Treasury and the 
Banking Association of South Africa agreed to review lending affordability 
rules, improve client education, and reduce wage garnishing after the number of 
people with bad credit rose to a record. Finance Minister Pravin Gordhan called 
the rise “worrying” a week earlier./xhtml:p xhtml:pGeorge Roussos, an 
executive for central support services at African Bank, says miner Ntseane 
borrowed more than he claims and took out a credit card. (The bank received 
permission from Ntseane, who denies the bank's figures, to discuss his account 
with xhtml:iBloomberg Businessweek/xhtml:i.) The bank says it stopped 
charging interest in 2011 and has no record of Ntseane making contact after he 
was injured in a home robbery in 2010. “The bank attempts to communicate 
clearly and transparently, employing multilingual consultants,” says 
Roussos./xhtml:p xhtml:pSouth African lenders have re sorted to 
court-ordered wage garnishing in more than 3 million active cases, according to 
the National Debt Mediation Association, a credit industry group that provides 
consumer debt counseling. Kem Westdyk, chief executive of xhtml:bSummit 
Garnishee Solutions/xhtml:b, which helps mining companies review bank 
requests, says at some companies up to 15 percent of workers have wages 
garnished; at one, more than a quarter of those cases involve African Bank. 
“They may have reckless lending problems,” says Westdyk, adding that some 
workers have five or six garnishee orders against them./xhtml:p 
xhtml:pNtseane says his loan agent didn't mention garnishment when she agreed 
to delay his loan payments. Although Davies and the country's credit regulator 
have pledged to clamp down on unsecured lending, Ntseane doesn't have high 
hopes. “I don't know when I will stop paying,” he says./xhtml:p xhtml:p 
prism:class=bylinexhtml:i—Franz Wild, Mike Cohen, and Renee 
Bonorchis/xhtml:i/xhtml:p xhtml:pxhtml:ixhtml:bThe bottom 
line/xhtml:b

Re: Reindexing strategy

2013-05-30 Thread Michael Sokolov


On 5/30/2013 8:30 AM, Dotan Cohen wrote:

On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:

It's impossible for us to give you hard numbers.  You'll have to
experiment to know how fast you can reindex without killing your
servers.  A basic tenet for such experimentation, and something you
hopefully already know: You'll want to get baseline measurements before
you begin testing for comparison.


Thanks. I wan't looking for hard numbers, but rather am looking for
what are the signs of problems. I know to keep my eye on memory and
CPU, but I have no idea how to check disk I/O, and I'm not sure how to
determine even if that becomes saturated.

On UNIX platforms, take a look at vmstat for basic I/O measurement, and 
iostat for more detailed stats.  One coarse measurement is the number of 
blocked/waiting processes - usually this is due to I/O contention, and 
you will want to look at the paging and swapping numbers - you don't 
want any swapping at all.  But the best single number to look at is 
overall disk activity, which is the I/O percentage utilized number Shaun 
was mentioning.


-Mike

RE: Support for Mongolian language

What would be the steps if we want to use Mongolian or any other language that 
is not supported?

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Thursday, May 30, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

No, there is not.

-- Jack Krupansky

-Original Message-
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its affiliates. 
Any views or opinions presented in this email are solely those of the author 
and may not necessarily reflect the opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. .
---
 




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Strip HTML Tags and Store

Update Request Processors to the rescue again. Namely, the HTML Strip Field 
Update processor:


Add to your solrconfig:

 updateRequestProcessorChain name=html-strip-features
   processor class=solr.HTMLStripFieldUpdateProcessorFactory
 str name=fieldNamefeatures/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain

See:
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html

Index content:

 curl 
http://localhost:8983/solr/update?commit=trueupdate.chain=html-strip-features; 
\

 -H 'Content-type:application/json' -d '
 [{id: doc-1,
   title: lt;Hello Worldgt;,
   features: pThis is a atest/a line gt;.,
   other_t: pOther btext/b/p,
   more_t: Some bmore itext/i./b The end}]'

Results:

 id:doc-1,
 title:[lt;Hello Worldgt;],
 features:[\nThis is a test line .],
 other_t:pOther btext/b/p,
 more_t:Some bmore itext/i./b The end,

That stripped the HTML only from the features field, and expanded the 
named character entity as well.


Add multiple str for multiple fields, or use fieldRegex, or... some 
other options. See:

http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html

-- Jack Krupansky

-Original Message- 
From: Kalyan Kuram

Sent: Thursday, May 30, 2013 8:18 PM
To: solr-user@lucene.apache.org
Subject: Strip HTML Tags and Store

Hi AllI am trying to understand what gets stored when i configure a field 
indexed and stored for example i have this in my schema.xmlfield 
name=articleBody type=text_general indexed=true stored=true /and 
fieldType name=text_general class=solr.TextField 
positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /

   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/

   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

I was expecting that solr will index  store html strip content when i 
invoke query i get some thing like this str 
name=articleBodyxhtml:h1xhtml:bSouth African Miners Are Trapped by 
Debt/xhtml:b/xhtml:h1 xhtml:pxhtml:b▸ A surge in high-interest 
lending contributes to mine violence/xhtml:b/xhtml:p xhtml:pxhtml:b▸ 
At least one bank “may have reckless lending problems”/xhtml:b/xhtml:p 
xhtml:pIn 2008, platinum miner James Ntseane borrowed 8,000 rand ($886) 
from xhtml:bAfrican Bank Investments/xhtml:b to pay for his 
grandmother's funeral. Soon after, he took out two more loans, totaling 
10,000 rand, for a sofa and house extension. Four years later he owes at 
least 30,515 rand, according to text messages he gets from African Bank, 
South Africa's biggest provider of unsecured loans. Under a court-ordered 
payment plan, his employer garnishes about 13 percent of his monthly 
12,600-rand salary for the lender. He doesn't know how much interest he's 
paying. “They are taking too much money,” says Ntseane, 41./xhtml:p 
xhtml:pNtseane is one of more than 9 million South Africans mired in debt. 
African Bank, xhtml:bBayport Financial Services, Capitec Bank 
Holdings/xhtml:b, and other firms have led a boom in unsecured lending, 
charging interest as high as 80 percent a year, as is allowed there. Last 
year a series of strikes led to at least 46 deaths, the country's worst 
mining violence since the end of apartheid. “One of the contributing factors 
to all of these strikes has been this surge in unsecured lending,” says Mike 
Schussler, chief economist at the research group a 
href=http://economists.co.za/;Economists.co.za/a, echoing an October 
statement by Trade and Industry Minister Rob Davies./xhtml:p xhtml:pThe 
value of consumer loans not backed by assets such as homes rose 39 percent 
in the year through September, to 140 billion rand, reports the National 
Credit Regulator. The loans made up 10 percent of consumer credit on Sept. 
30, up from 8 percent a year earlier. In November, South Africa's National 
Treasury and the Banking Association of South Africa agreed to review 
lending affordability rules, improve client education, and reduce wage 
garnishing after the number of people with bad credit rose to a record. 
Finance Minister Pravin Gordhan called the rise “worrying” a week 
earlier./xhtml:p xhtml:pGeorge Roussos, an executive for central support 
services at African Bank, says miner Ntseane borrowed more than he claims 
and took out a credit card. (The bank received permission from

Re: Support for Mongolian language

Well, you would need a tokenizer, probably a stemmer, a list of
stop-words (to ignore). Is the original text in UTF8 or is it in some
alternative encoding.

A quick search showed that there is an academic paper where they are
trying to work with Mongolian to get it into Lucene. It seems quite
relevant and would be a great point to start:
http://scholar.google.ca/scholar?cluster=15851397934729234574hl=enas_sdt=0,5

It also lists a lot of challenges that happened with other languages
before UTF8 became the main standard (Russian and Ukranian come to
mind).

Hope it helps,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi
sagar.chaturv...@nectechnologies.in wrote:
 What would be the steps if we want to use Mongolian or any other language 
 that is not supported?

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Thursday, May 30, 2013 5:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Support for Mongolian language

 No, there is not.

 -- Jack Krupansky

 -Original Message-
 From: Sagar Chaturvedi
 Sent: Thursday, May 30, 2013 3:03 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Support for Mongolian language

 I have already checked this link. Could not find any hint about Mongolian 
 language. Is there any plugin available for that?

 -Original Message-
 From: bbarani [mailto:bbar...@gmail.com]
 Sent: Thursday, May 30, 2013 2:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Support for Mongolian language

 Check out..

 wiki.apache.org/solr/LanguageAnalysis‎

 For some reason the above site takes long time to open..






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 DISCLAIMER:
 ---
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its affiliates. 
 Any views or opinions presented in this email are solely those of the author 
 and may not necessarily reflect the opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification, 
 distribution and / or publication of this message without the prior written 
 consent of the author of this e-mail is strictly prohibited. If you have 
 received this email in error please delete it and notify the sender 
 immediately. .
 ---




 DISCLAIMER:
 ---
 The contents of this e-mail and any attachment(s) are confidential and
 intended
 for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in
 this email are solely those of the author and may not necessarily reflect the
 opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification,
 distribution and / or publication of
 this message without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have
 received this email in error please delete it and notify the sender
 immediately. .
 ---

RE: Support for Mongolian language

Thanks Alexandre for the link. It was really helpful.

The original text will be in UTF-8.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Friday, May 31, 2013 8:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Well, you would need a tokenizer, probably a stemmer, a list of stop-words (to 
ignore). Is the original text in UTF8 or is it in some alternative encoding.

A quick search showed that there is an academic paper where they are trying to 
work with Mongolian to get it into Lucene. It seems quite relevant and would be 
a great point to start:
http://scholar.google.ca/scholar?cluster=15851397934729234574hl=enas_sdt=0,5

It also lists a lot of challenges that happened with other languages before 
UTF8 became the main standard (Russian and Ukranian come to mind).

Hope it helps,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi 
sagar.chaturv...@nectechnologies.in wrote:
 What would be the steps if we want to use Mongolian or any other language 
 that is not supported?

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Thursday, May 30, 2013 5:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Support for Mongolian language

 No, there is not.

 -- Jack Krupansky

 -Original Message-
 From: Sagar Chaturvedi
 Sent: Thursday, May 30, 2013 3:03 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Support for Mongolian language

 I have already checked this link. Could not find any hint about Mongolian 
 language. Is there any plugin available for that?

 -Original Message-
 From: bbarani [mailto:bbar...@gmail.com]
 Sent: Thursday, May 30, 2013 2:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Support for Mongolian language

 Check out..

 wiki.apache.org/solr/LanguageAnalysis‎

 For some reason the above site takes long time to open..






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp40
 66871p4066874.html Sent from the Solr - User mailing list archive at 
 Nabble.com.



 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its affiliates. 
 Any views or opinions presented in this email are solely those of the author 
 and may not necessarily reflect the opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification, 
 distribution and / or publication of this message without the prior written 
 consent of the author of this e-mail is strictly prohibited. If you have 
 received this email in error please delete it and notify the sender 
 immediately. .
 --
 -




 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its 
 affiliates. Any views or opinions presented in this email are solely 
 those of the author and may not necessarily reflect the opinions of 
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, 
 modification, distribution and / or publication of this message 
 without the prior written consent of the author of this e-mail is 
 strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. .
 --
 -



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender

Re: Support for Mongolian language

Try using the text_general field type and see how reasonable or 
unreasonable the standard tokenizer is at identifying reasonable word breaks 
for some sample Mongolian text.


Use the Solr Admin UI Analyzer page to see what the various term analysis 
filters output.


-- Jack Krupansky

-Original Message- 
From: Sagar Chaturvedi

Sent: Thursday, May 30, 2013 10:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

What would be the steps if we want to use Mongolian or any other language 
that is not supported?


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, May 30, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

No, there is not.

-- Jack Krupansky

-Original Message-
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?


-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..






--
View this message in context:
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and 
intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its 
affiliates. Any views or opinions presented in this email are solely those 
of the author and may not necessarily reflect the opinions of NEC or its 
affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. .

---




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect 
the

opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail 
is

strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

RE: Support for Mongolian language

Hi,

On solr admin UI, in a query I am trying to highlight some fields. I have set 
hl = true, given name of comma separated fields in hl.fl but fields are not 
getting highlighted. Any insights?

Regards,
Sagar







DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Highlighting fields