Re: how often do you boys restart your tomcat?

2011-07-27 Thread Bernd Fehling

Till now I used jetty and got 2 week as the longest uptime until OOM.
I just switched to tomcat6 and will see how that one behaves but
I think its not a problem of the servlet container.
Solr is pretty unstable if having a huge database.
Actually this can't be blamed directly to Solr it is a problem of
Lucene and its fieldCache. Somehow during 2 weeks runtime with searching
and replication the fieldCache gets doubled until OOM.

Currently there is no other solution to this than restarting your
tomcat or jetty regularly :-(


Am 27.07.2011 03:42, schrieb Bing Yu:

I find that, if I do not restart the master's tomcat for some days,
the load average will keep rising to a high level, solr become slow
and unstable, so I add a crontab to restart the tomcat everyday.

do you boys restart your tomcat ? and is there any way to avoid restart tomcat?


Re: how often do you boys restart your tomcat?

2011-07-27 Thread Paul Libbrecht
On curriki.org, our solr's Tomcat saturates memory after 2-4 weeks.
I am still investigating if I am accumulating something or something else is.

To check it, I am running a query all, return num results every minute to 
measure the time it takes. It's generally when it meets a big GC that gives a 
timeout that I start to worry. Memory then starts to be hogged but things get 
back to normal as soon as the GC is out.

I had other tomcat servers with very long uptimes (more than 6 months) so I do 
not think tomcat is guilty.

Currently I can only show the freememory of the system and what's in 
solr-stats, but I do not know what to look at really...

paul

Le 27 juil. 2011 à 03:42, Bing Yu a écrit :

 I find that, if I do not restart the master's tomcat for some days,
 the load average will keep rising to a high level, solr become slow
 and unstable, so I add a crontab to restart the tomcat everyday.
 
 do you boys restart your tomcat ? and is there any way to avoid restart 
 tomcat?



Re: how often do you boys restart your tomcat?

2011-07-27 Thread Bernd Fehling


It is definately Lucenes fieldCache making the trouble.
Restart your solr and monitor it with jvisualvm, especially OldGen heap.
When it gets to 100 percent filled use jmap to dump heap of your system.
Then use Eclipse Memory Analyzer http://www.eclipse.org/mat/ and
open the heap dump. You will see a pie chart and can easily identify
the largets consumer of your heap space.



Am 27.07.2011 09:02, schrieb Paul Libbrecht:

On curriki.org, our solr's Tomcat saturates memory after 2-4 weeks.
I am still investigating if I am accumulating something or something else is.

To check it, I am running a query all, return num results every minute to 
measure the time it takes. It's generally when it meets a big GC that gives a timeout 
that I start to worry. Memory then starts to be hogged but things get back to normal as 
soon as the GC is out.

I had other tomcat servers with very long uptimes (more than 6 months) so I do 
not think tomcat is guilty.

Currently I can only show the freememory of the system and what's in 
solr-stats, but I do not know what to look at really...

paul

Le 27 juil. 2011 à 03:42, Bing Yu a écrit :


I find that, if I do not restart the master's tomcat for some days,
the load average will keep rising to a high level, solr become slow
and unstable, so I add a crontab to restart the tomcat everyday.

do you boys restart your tomcat ? and is there any way to avoid restart tomcat?




Re: Conditional field values in DataImport

2011-07-27 Thread Gora Mohanty
On Wed, Jul 27, 2011 at 7:20 AM, solruser@9913 gunaranj...@yahoo.com wrote:
 This may be a trivial question - I am noob :).
 In the dataimport of a CSV file, am trying to assign a field based on a
 conditional check on another field.

 E.g.
   field name=rawLine regex=CSV-splitting-regex groupNames=X,Y,Z /

   this works well.  However I need to create another field A that is
 assigned a value based on X.
[...]

A ScriptTransformer should do the job. Please see
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer

Regards,
Gora


Re: Different options for autocomplete/autosuggestion

2011-07-27 Thread scorpking
HI Bell, 
i used autocomplete in solr 3.1. same this: 

  searchComponent name=autocomplete class=solr.SpellCheckComponent
lst name=spellchecker
  str name=nameautocomplete/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.jaspell.JaspellLookup/s
tr
  str name=fieldautocomplete/str
  str name=buildOnCommittrue/str 
/lst 

and i make following URL*
http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/* to index my
data. and had a problem. with one word, it have done very good. But when i
typed more two words, rerults return not right. I don't know why? Can any
one know this problem? Thanks for your help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-options-for-autocomplete-autosuggestion-tp2678899p3203032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr vs ElasticSearch

2011-07-27 Thread Tarjei Huse
On 06/01/2011 08:22 AM, Jason Rutherglen wrote:
 Thanks Shashi, this is oddly coincidental with another issue being put
 into Solr (SOLR-2193) to help solve some of the NRT issues, the timing
 is impeccable.
Hmm, does anyone have an idea on when this will be finished?

I'm considering if I should wait for the patch to solidify or if I
should switch to ES.
 At a base however Solr uses Lucene, as does ES.  I think the main
 advantage of ES is the auto-sharding etc.  I think it uses a gossip
 protocol to capitalize on this however... Hmm...
Yes it looks nice.
T
 On Tue, May 31, 2011 at 10:01 PM, Shashi Kant sk...@sloan.mit.edu wrote:
 Here is a very interesting comparison

 http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/


 -Original Message-
 From: Mark
 Sent: May-31-11 10:33 PM
 To: solr-user@lucene.apache.org
 Subject: Solr vs ElasticSearch

 I've been hearing more and more about ElasticSearch. Can anyone give me a
 rough overview on how these two technologies differ. What are the
 strengths/weaknesses of each. Why would one choose one of the other?

 Thanks




-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
 
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The 
Royal Bank of Scotland N.V. is authorised and regulated by the 
De Nederlandsche Bank and has its seat at Amsterdam, the 
Netherlands, and is registered in the Commercial Register under 
number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
The Royal Bank of Scotland plc are authorised to act as agent for each 
other in certain jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please 
return the message to the sender by replying to it and then delete the 
message from your computer. Internet e-mails are not necessarily 
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
N.V. including its affiliates (RBS group) does not accept responsibility 
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Information, may be read by authorised persons
within RBS group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of 
viruses, it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will 
not adversely affect its systems or data. No responsibility is accepted 
by the RBS group in this regard and the recipient should carry out such 
virus and other checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


Re: Autocomplete with Solr 3.1

2011-07-27 Thread O. Klein
I know the solution, just not how to actually implement it, but maybe
somebody can help with that :)

From Wiki:

If you want to use a dictionary file that contains phrases (actually,
strings that can be split into multiple tokens by the default
QueryConverter) then define a different QueryConverter like this:

  
  queryConverter name=queryConverter
class=org.apache.solr.spelling.MySpellingQueryConverter/

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3203191.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to make a valid date facet query?

2011-07-27 Thread Tomás Fernández Löbbe
Hi Floyd, yes, those queries are supported. Make sure you use the
right encoding for the plus sign:

facet.query=onlinedate:[NOW/YEAR-3YEARS TO NOW/YEAR%2B5YEARS]

the result of this facet query will be the number of documents in the result
set that match that range. You'll have to use different facet queries for
the different ranges to achieve what you want.

Regards,

Tomás

On Wed, Jul 27, 2011 at 12:43 AM, Floyd Wu floyd...@gmail.com wrote:

 Hi Tomás

 Is facet queries support following queries?

 facet.query=onlinedate:[NOW/YEAR-3YEARS TO NOW/YEAR+5YEARS]

 I tried this but returned result was not correct.

 Am I missing something?

 Floyd

 2011/7/26 Tomás Fernández Löbbe tomasflo...@gmail.com

  Hi Floyd, I don't think the feature that allows to use multiple gaps for
 a
  range facet is committed. See
  https://issues.apache.org/jira/browse/SOLR-2366
  You can achieve a similar functionality by using facet.query. see:
 
 
 http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields_and_Facet_Queries
 
  Regards,
 
  Tomás
  On Tue, Jul 26, 2011 at 1:23 AM, Floyd Wu floyd...@gmail.com wrote:
 
   Hi all,
  
   I need to make date faceted query and I tried to use facet.range but
  can't
   get result I need.
  
   I want to make 4 facet like following.
  
   1 Months,3 Months, 6Months, more than 1 Year
  
   The onlinedate field in schema.xml like this
  
   field name=onlinedate type=tdate indexed=true stored=true/
  
   I hit the solr by this url
  
   http://localhost:8983/solr/select/?q=*%3A*
   start=0
   rows=10
   indent=on
   facet=true
   facet.range=onlinedate
   f.onlinedate.facet.range.start=NOW-1YEARS
   f.onlinedate.facet.range.end=NOW%2B1YEARS
   f.onlinedate.facet.range.gap=NOW-1MONTHS, NOW-3MONTHS,
   NOW-6MONTHS,NOW-1YEAR
  
   But the solr complained Exception during facet.range of onlinedate
   org.apache.solr.common.SolrException: Can't add gap NOW-1MONTHS,
   NOW-3MONTHS, NOW-6MONTHS,NOW-1YEAR to value Mon Jul 26 11:56:40 CST
 2010
   for
   
  
   What is correct way to make this requirement to realized? Please help
 on
   this.
   Floyd
  
 



what data type for geo fields?

2011-07-27 Thread Peter Wolanin
Looking at the example schema:

http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml

the solr.PointType field type uses double (is this just an example
field, or used for geo search?), while the solr.LatLonType field uses
tdouble and it's unclear how the geohash is translated into lat/lon
values or if the geohash itself might typically be used as a copyfield
and use just for matching a query on a geohash?

Is there an advantage in terms of speed to using Trie fields for
solr.LatLonType?  I would assume so, e.g. for bbox operations.

Thanks,

Peter

-- 
Peter M. Wolanin, Ph.D.      : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 978-296-5247

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;


Delete by range query

2011-07-27 Thread Mohammad Shariq
Hi,
I want to delete the bunch of docs from my solr using rangeQuery.
I have one field called 'time' which is tint.

I am deleting using the query :
deletequerytime:[1296777600+TO+1296778000]/query/delete

but solr is returning Error, Saying bad request.
however I am able to delete one by one using below deleteQuery:
deletequerytime:1296777600/query/delete

Please suggest any solution to this problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: Solr vs ElasticSearch

2011-07-27 Thread Jeff Schmidt
You might also check out Solandra:

https://github.com/tjake/Solandra

With Solr's configuration and indexes in Cassandra, you can benefit from 
replication, distribution etc., and still have Cassandra available for non-Solr 
specific purposes.

Cheers,

Jeff

On Jul 27, 2011, at 5:17 AM, Tarjei Huse wrote:

 On 06/01/2011 08:22 AM, Jason Rutherglen wrote:
 Thanks Shashi, this is oddly coincidental with another issue being put
 into Solr (SOLR-2193) to help solve some of the NRT issues, the timing
 is impeccable.
 Hmm, does anyone have an idea on when this will be finished?
 
 I'm considering if I should wait for the patch to solidify or if I
 should switch to ES.
 At a base however Solr uses Lucene, as does ES.  I think the main
 advantage of ES is the auto-sharding etc.  I think it uses a gossip
 protocol to capitalize on this however... Hmm...
 Yes it looks nice.
 T
 On Tue, May 31, 2011 at 10:01 PM, Shashi Kant sk...@sloan.mit.edu wrote:
 Here is a very interesting comparison
 
 http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
 
 
 -Original Message-
 From: Mark
 Sent: May-31-11 10:33 PM
 To: solr-user@lucene.apache.org
 Subject: Solr vs ElasticSearch
 
 I've been hearing more and more about ElasticSearch. Can anyone give me a
 rough overview on how these two technologies differ. What are the
 strengths/weaknesses of each. Why would one choose one of the other?
 
 Thanks
 
 
 
 
 -- 
 Regards / Med vennlig hilsen
 Tarjei Huse
 Mobil: 920 63 413
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Re: what data type for geo fields?

2011-07-27 Thread Yonik Seeley
On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin peter.wola...@acquia.com wrote:
 Looking at the example schema:

 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml

 the solr.PointType field type uses double (is this just an example
 field, or used for geo search?)

While you could possibly use PointType for geo search, it doesn't have
good support for it (it's more of a general n-dimension point)
The LatLonType has all the geo support currently.

, while the solr.LatLonType field uses
 tdouble and it's unclear how the geohash is translated into lat/lon
 values or if the geohash itself might typically be used as a copyfield
 and use just for matching a query on a geohash?

There's no geohash used in LatLonType
It is indexed as a lat and lon under the covers (using the suffix _d)

 Is there an advantage in terms of speed to using Trie fields for
 solr.LatLonType?

Currently only for explicit range queries... like point:[10,10 TO 20,20]

  I would assume so, e.g. for bbox operations.

It's a bit of an implementation detail, but bbox doesn't currently use
range queries.

-Yonik
http://www.lucidimagination.com


Re: Delete by range query

2011-07-27 Thread Koji Sekiguchi

deletequerytime:[1296777600+TO+1296778000]/query/delete


Should be deletequerytime:[1296777600 TO 1296778000]/query/delete ?

koji
--
http://www.rondhuit.com/en/


Re: Solr vs ElasticSearch

2011-07-27 Thread Yonik Seeley
On Wed, Jul 27, 2011 at 7:17 AM, Tarjei Huse tar...@scanmine.com wrote:
 On 06/01/2011 08:22 AM, Jason Rutherglen wrote:
 Thanks Shashi, this is oddly coincidental with another issue being put
 into Solr (SOLR-2193) to help solve some of the NRT issues, the timing
 is impeccable.
 Hmm, does anyone have an idea on when this will be finished?

It's in trunk now... try it out!

-Yonik
http://www.lucidimagination.com


Re: Delete by range query

2011-07-27 Thread Mohammad Shariq
Thanks Koji
Its working now.


On 27 July 2011 19:30, Koji Sekiguchi k...@r.email.ne.jp wrote:

 deletequerytime:[**1296777600+TO+1296778000]/**query/delete


 Should be deletequerytime:[**1296777600 TO
 1296778000]/query/delete ?

 koji
 --
 http://www.rondhuit.com/en/




-- 
Thanks and Regards
Mohammad Shariq


Re: using distributed search with the suggest component

2011-07-27 Thread Tobias Rübner
Thanks, but this does not work.
Looking at the log files, I see only one request, when executing a search.
Executing a request to the default servlet (/select) with multiple shards,
each core gets ask for the current query.

Any other suggestions?
Tobias


On Tue, Jul 26, 2011 at 2:11 PM, mdz-munich sebastian.lu...@bsb-muenchen.de
 wrote:

 Hi Tobias,

 try this, it works for us (Solr 3.3):

 solrconfig.xml:

 /searchComponent name=suggest class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypeword/str
 lst name=spellchecker
 str name=namesuggestion/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
 str name=fieldwordCorpus/str
 str name=comparatorClassscore/str
 str name=storeDir./suggester/str
 str name=buildOnCommitfalse/str
 str name=buildOnOptimizetrue/str
 float name=threshold0.005/float
 /lst

 requestHandler name=/suggest class=solr.SearchHandler
 lst name=defaults
 str name=omitHeadertrue/str
 str name=spellchecktrue/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.dictionarysuggestion/str
 str name=spellcheck.count50/str
 str name=spellcheck.maxCollations50/str
 /lst
 arr name=components
 strsuggest/str
 /arr
 /requestHandler/

 Query like that:


 http://localhost:8080/solr/core.01/suggest?q=wordPrefixshards=localhost:8080/solr/core.01,localhost:8080/solr/core.02shards.qt=/suggest


 Greetz,

 Sebastian



 Tobias Rübner wrote:
 
  Hi,
 
  I try to use the suggest component (solr 3.3) with multiple cores.
  I added a search component and a request handler as described in the docs
  (
  http://wiki.apache.org/solr/Suggester) to my solrconfig.
  That works fine for 1 core but querying my solr instance with the shards
  parameter does not query multiple cores.
  It just ignores the shards parameter.
 
 http://localhost:/solr/core1/suggest?q=sashards=localhost:/solr/core1,localhost:/solr/core2
 
  The documentation of the SpellCheckComponent (
 
 http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support
 )
  is a bit vage in that point, because I don't know if this feature really
  works with solr 3.3. It is targeted for solr 1.5, which will never come,
  but
  says, it is now available.
  I also tried the shards.qt paramater, but it does not change my results.
 
  Thanks for any help,
  Tobias
 


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/using-distributed-search-with-the-suggest-component-tp3197651p3200143.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Problem starting solr on jetty

2011-07-27 Thread Steven A Rowe
Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com] 
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS 
group) does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


Why Slop doens't match anything?

2011-07-27 Thread Alexander Ramos Jardim
Hello pals,

Using solr 1.4.0. Trying to understand something. When I run the query
*fieldA:nokia
c3*, I get 5 results. All with nokia c3, as expected. But when I run
fieldA:nokia c3~100, I don get any result!

As far as I understand the ~100 should make my query bring even more
results as not only documents with nokia c3 in their fieldA will be found.
Something like nokia blue c3 should match too. Right?

So, why I don't get any result? Any know bug?


-- 
Alexander Ramos Jardim


Re: Dealing with keyword stuffing

2011-07-27 Thread Gora Mohanty
On Wed, Jul 27, 2011 at 7:15 PM, Pranav Prakash pra...@gmail.com wrote:
 I guess most of you have already handled and many of you might still be
 handling keyword stuffing. Here is my scenario. We have a huge index
 containing about 6m docs. (Not sure if that is huge :-) And every document
 contains title, description, tags, content (textual data). People have been
 doing keyword stuffing on the documents, so when searched for a query
 term, the first results are always the ones who are optimized.

 So, instead of people getting relevant results, they get spam content
 (highly optimized, keyword stuffed content) as first few results. I have
 tried a couple of things like providing different boosts to different
 fields, but almost everything seems to fail.
[...]

Presumably, they are doing this by increasing tf (term frequency),
i.e., by repeating keywords multiple times. If so, you can use a custom
similarity class that caps term frequency, and/or ensures that the scoring
increases less than linearly with tf. Please see
http://wiki.apache.org/solr/SchemaXml#Similarity , and/or do a web
search for more details.

Regards,
Gora


Re: Exact match not the first result returned

2011-07-27 Thread Brian Lamb
Thanks Emmanuel for that explanation. I implemented your solution but I'm
not quite there yet. Suppose I also have a record:

RECORD 3
arr name=myname
  strFred G. Anderson/str
  strFred Anderson/str
/arr

With your solution, RECORD 1 does appear at the top but I think thats just
blind luck more than anything else because RECORD 3 shows as having the same
score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
like all three records returned with RECORD 1 being the first listing.

Thanks,

Brian Lamb

On Tue, Jul 26, 2011 at 6:03 PM, Emmanuel Espina
espinaemman...@gmail.comwrote:

 That is caused by the size of the documents. The principle is pretty
 intuitive if one of your documents is the entire three volumes of The Lord
 of the Rings, and you search for tree I know that The Lord of the Rings
 will be in the results, and I haven't memorized the entire text of that
 book
 :p
 It is a matter of probability that if you have a big (big!) text any word
 will have a greater chance to be found than in a smaller letter. So one can
 infer that the letter is more relevant than the big text. That is the
 principle applied here and Lucene does that when building the ranking.
 The first document is bigger (remember that all the values of a multivalued
 field are merged into one field in the index, so you can not tell one value
 from another apart) than the second one. In the first one you have
 [Fred, coolest,
 guy, town] and in the second [Fred, Anderson], so the second document is
 more relevant than the first one.

 To avoid all this procedure you can set omitNorms to true and that should
 make the first document more relevant because Fred appears twice (not
 because Fred appears alone in a value)

 Regards
 Emmanuel

 2011/7/26 Brian Lamb brian.l...@journalexperts.com

  Hi all,
 
  I am a little confused as to why the scoring is working the way it is:
 
  I have a field defined as:
 
  field name=myname type=text indexed=true stored=true
  required=false multivalued=true /
 
  And I have several documents where that value is:
 
  RECORD 1
  arr name=myname
   strFred/str
   strFred (the coolest guy in town)/str
  /arr
 
  OR
 
  RECORD 2
  arr name=myname
   strFred Anderson/str
  /arr
 
  What happens when I do a search for
  http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2
  returned before RECORD 1.
 
  RECORD 2
  5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of:
   1.0 = tf(termFreq(myname:Fred)=1)
   8.451541 = idf(docFreq=7306, maxDocs=12586425)
   0.625 = fieldNorm(field=myname, doc=256575)
 
  RECORD 1
  4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of:
   1.4142135 = tf(termFreq(myname:Fred)=2)
   8.451541 = idf(docFreq=7306, maxDocs=12586425)
   0.375 = fieldNorm(field=myname, doc=215)
 
  So the difference is fieldNorm obviously but I think that's only part
  of the story. Why is RECORD 2 returned with a higher score than RECORD
  1 even though RECORD 1 matches Fred exactly? And how should I do
  this differently so that I am getting the results I am expecting?
 
  Thanks,
 
  Brian Lamb
 



Re: Why Slop doens't match anything?

2011-07-27 Thread Gora Mohanty
On Wed, Jul 27, 2011 at 8:38 PM, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
 Hello pals,

 Using solr 1.4.0. Trying to understand something. When I run the query
 *fieldA:nokia
 c3*, I get 5 results. All with nokia c3, as expected. But when I run
 fieldA:nokia c3~100, I don get any result!

 As far as I understand the ~100 should make my query bring even more
 results as not only documents with nokia c3 in their fieldA will be found.
 Something like nokia blue c3 should match too. Right?
[...]

That does seem odd. You are not using the dismax query handler by
any chance, are you? If so, then the query slop needs to be specified
by adding qs=100 to the query.

Regards,
Gora


Solr Master-slave master failover without data loss

2011-07-27 Thread Nagendraprasad
Suppose master goes down immediately after the index updates, while the
updates haven't been replicated to the slaves, data loss seems to happen.
Does Solr have any mechanism to deal with that?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Master-slave-master-failover-without-data-loss-tp3203644p3203644.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter content upon indexing

2011-07-27 Thread Rafael Ribeiro
Hi all,

 I am trying to index html documents using Solr and I am having difficulties
to extract certain parts of the main content of the document and store them
sepparately into other fields. I saw on the docs that it is possible to
achieve this using xpath but in my certain case I need to do a regex match. 
 To be more specifical I am willing to copy a certain pattern content to
title field. My first attempt was to define a custom field type with a
PatternFilter and copy content field to title field but this did not work.
Next attempt was to specify that copyField tag would have a pattern and
group attributes but this did not work as well.

 Is it possible to do what I am trying? I am unwilling to resort to grep
outside Solr as I am pretty sure Solr is capable of doing what I want...

best regards,
Rafael Ribeiro

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-content-upon-indexing-tp3203946p3203946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jetty Logs - Max line size?

2011-07-27 Thread alexander sulz

Hello

I enabled Jetty Logs but my GET requests seem so long that they get 
truncated and without a line break,

so in the end it looks like this:

notice the logged ping and where it begins.
How can i change this?

thank you very much

000.000.000.000 -  -  [27/Jul/2011:17:38:04 +0100] GET 
/solr/select?fl=id%2Cdoc_feature%2Cmandant%2Cd_id%2Cdoc_title%2Cdoc_id%2Cscore%2Ccategory%2Canrede%2Ckeyword_a%2Ckeyword_a_name%2Cmenu_id%2Cprice%2Caprice%2Cstandort_id%2Cdate_start%2Cdate_end%2Cdlpath%2Cdownloads%2Cdoc_type%2Cdoc_id%2Cmenu_path_text%2Ctitle%2Csummary%2Csubtitle%2Cdate_online%2Ctables%2Ckdnrzen%2Cplz%2Cort%2Cadresse%2Cbundesland%2Ctelefonnummer%2Cmobil%2Cfax%2Cemail%2Cnachname%2Cvorname%2Chauptfunktion%2Cbereichname%2Cfilialenamesort=date_online+descrows=0version=1.2wt=jsonjson.nl=mapq=text_copy%3A%28schuhe%29+%28category%3A%22Lagerhaus%22+OR+%28category%3ASortiment+AND+mandant%3A%28%22Lagerhaus%22+OR+Portal%29%29+OR+kdnrzen%3A%28%2A%29%29+AND+-%28doc_feature%3Abroschuere+AND+doc_type%3Acontent%29+AND+-%28menu_path_text%3A%2ATables%2A%29+AND+-%28id%3Acld_%2A%29+date_online%3A%5B2006-06-21T00%3A00%3A00Z+TO+2011-7-27T23%3A59%3A59Z%5D+date_offline%3A%5B2011-7-27T00%3A00%3A00Z+TO+%2A%5D+%28doc_type%3Acontent+AND+-doc_feature%3A%28ange000.000.000.000 
-  -  [27/Jul/2011:17:38:04 +0100] HEAD /solr/admin/ping HTTP/1.0 200 0




Re: Autocomplete with Solr 3.1

2011-07-27 Thread scorpking
Hi Klein, 
Thanks for your reply. But i tried some suggestion with solr, and results
return is good. But i want to using search component with solr 3.1. Now i
have had some problems with Suggester. i think my problem perhaps about in
schema file. This is schema file: 

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

And i defined fields: 
field name=s_SongId type=string indexed=true stored=true/
field name=s_SongName type=text indexed=true stored=true/
field name=search_autocomplete type=text_auto indexed=true
stored=true multiValued=true/

where: 
fieldType with text_auto:
fieldType class=solr.TextField name=text_auto
positionIncrementGap=100
 analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType
In file solrconfig.xml i defined: 
searchComponent name=spellcheck-autocomplete
class=solr.SpellCheckComponent
 lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
  str name=fieldsearch_autocomplete/str
  str name=buildOnCommittrue/str
 /lst
/searchComponent
  
  requestHandler name=/autocomplete
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
/lst
arr name=components
strspellcheck-autocomplete/str
/arr
  /requestHandler

Can any one help???

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3204176.html
Sent from the Solr - User mailing list archive at Nabble.com.


schema.xml changes, need re-indexing ?

2011-07-27 Thread Charles-Andre Martin
Hi,

 

We currently have a big index in production. We would like to add 2 
non-required fields to our schema.xml :

 

field name=myfield type=boolean indexed=true stored=true 
required=false/  

field name=myotherfield type=string indexed=true stored=true 
required=false multiValued=true/   

 

I made some tests:

 

-  I stopped tomcat

-  I changed the schema.xml

-  I started tomcat

 

The data was still there and I was able to add new document with theses 2 
fields.

 

So far, it looks I won't need to re-index all my data. Am I right ? Do I need 
to re-index all my data or in that case I'm fine ?

 

Thank you !

 

Charles-André Martin



Re: Filter content upon indexing

2011-07-27 Thread Emmanuel Espina
If you can express what you want with a regular expression then the pattern
Filter should work! I'm thinking that maybe you tokenized the field and that
invalidated the structure of the html.

I would use a contents field analized with a
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
and use copyField to another field Title that has a KeywordTokenizer in
combination with PatternFilter (with the pattern of the title of your pages)

Thanks
Emmanuel

2011/7/27 Rafael Ribeiro rafae...@gmail.com

 Hi all,

  I am trying to index html documents using Solr and I am having
 difficulties
 to extract certain parts of the main content of the document and store them
 sepparately into other fields. I saw on the docs that it is possible to
 achieve this using xpath but in my certain case I need to do a regex match.
  To be more specifical I am willing to copy a certain pattern content to
 title field. My first attempt was to define a custom field type with a
 PatternFilter and copy content field to title field but this did not work.
 Next attempt was to specify that copyField tag would have a pattern and
 group attributes but this did not work as well.

  Is it possible to do what I am trying? I am unwilling to resort to grep
 outside Solr as I am pretty sure Solr is capable of doing what I want...

 best regards,
 Rafael Ribeiro

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Filter-content-upon-indexing-tp3203946p3203946.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Data Import Handler Architecture Diagram

2011-07-27 Thread solruser@9913
Maybe I am looking at the wrong version - the diagram (and the screenshot in
the interactive dev mode section)  don't show up in the WIKI page.  

http://wiki.apache.org/solr/DataImportHandler#Architecture

Is this a wrong link?

I did an inspect element and this is what I see ...

/solr/DataImportHandler?action=AttachFileamp;do=getamp;target=DataImportHandlerOverview.png
 

-g

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Architecture-Diagram-tp3204459p3204459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Data Import Handler Diagram

2011-07-27 Thread solruser@9913
Maybe I am looking at the wrong version - the diagram (and the screenshot in
the interactive dev mode section)  don't show up in the WIKI page.   

http://wiki.apache.org/solr/DataImportHandler#Architecture

Is this a wrong link? 

I did an inspect element and this is what I see ... 

 ... img alt=DataImportHandlerOverview.png class=attachment
src=/solr/DataImportHandler?action=AttachFileamp;do=getamp;target=DataImportHandlerOverview.png
title=DataImportHandlerOverview.png 

-g

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Diagram-tp3204470p3204470.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: schema.xml changes, need re-indexing ?

2011-07-27 Thread Michael Ryan
You should be fine - no need to re-index your data.

Adding and removing fields is generally safe to do without a re-index. Changing 
a field (its type, analyzers, etc) requires more caution and generally does 
require a re-index.

-Michael


Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread Alexei Martchenko
I believe you're fine with that. Don't need to reindex all solr database.

2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca

 Hi,



 We currently have a big index in production. We would like to add 2
 non-required fields to our schema.xml :



 field name=myfield type=boolean indexed=true stored=true
 required=false/

 field name=myotherfield type=string indexed=true stored=true
 required=false multiValued=true/



 I made some tests:



 -  I stopped tomcat

 -  I changed the schema.xml

 -  I started tomcat



 The data was still there and I was able to add new document with theses 2
 fields.



 So far, it looks I won't need to re-index all my data. Am I right ? Do I
 need to re-index all my data or in that case I'm fine ?



 Thank you !



 Charles-André Martin




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Filter content upon indexing

2011-07-27 Thread Emmanuel Espina
I want to add, that since the stored text (not the indexed) is not analyzed,
if you retrieve the title you will get all the html. If you want to extract
the title for storage in a separate field that will have to be done with a
different tool not just with the analysis. My previous answer was focused
only in extraction of text for searching purposes.

Thanks
Emmanuel

2011/7/27 Emmanuel Espina espinaemman...@gmail.com

 If you can express what you want with a regular expression then the pattern
 Filter should work! I'm thinking that maybe you tokenized the field and that
 invalidated the structure of the html.

 I would use a contents field analized with a
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 and use copyField to another field Title that has a KeywordTokenizer in
 combination with PatternFilter (with the pattern of the title of your pages)

 Thanks
 Emmanuel


 2011/7/27 Rafael Ribeiro rafae...@gmail.com

 Hi all,

  I am trying to index html documents using Solr and I am having
 difficulties
 to extract certain parts of the main content of the document and store
 them
 sepparately into other fields. I saw on the docs that it is possible to
 achieve this using xpath but in my certain case I need to do a regex
 match.
  To be more specifical I am willing to copy a certain pattern content to
 title field. My first attempt was to define a custom field type with a
 PatternFilter and copy content field to title field but this did not work.
 Next attempt was to specify that copyField tag would have a pattern and
 group attributes but this did not work as well.

  Is it possible to do what I am trying? I am unwilling to resort to grep
 outside Solr as I am pretty sure Solr is capable of doing what I want...

 best regards,
 Rafael Ribeiro

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Filter-content-upon-indexing-tp3203946p3203946.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Indexing SharePoint from SolrJ

2011-07-27 Thread Twomey, David

Does anyone have examples of indexing SP content using the Google Connectors 
API and using SolrJ.

I know Lucid Imagination has a Sharepoint connector and I have used that 
successfully.

However,  I would like to create a thumbnail image of PDF's and PPT docs and 
add that to my index and I assume I need to use solrJ and some third party 
libraries  to do that.  Hence I want to crawl Sharepoint using SolrJ so I can 
then call third party libraries at index time.


Thanks so much
David





Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
Anyone tried this? I can not start Solr-Tomcat with following options on
Ubuntu:

JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -Xmn256m -XX:MaxPermSize=256m
JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/data/solr -Dfile.encoding=UTF8
-Duser.timezone=GMT
-Djava.util.logging.config.file=/data/solr/logging.properties
-Djava.net.preferIPv4Stack=true
JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode  -XX:+AggressiveOpts -XX:NewSize=64m
-XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=77
-XX:+CMSParallelRemarkEnabled
JAVA_OPTS=$JAVA_OPTS -verbose:gc  -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:/data/solr/solr-gc.log


Tomcat log (something about PorterStemFilter; Solr 3.3.0):

INFO: Server startup in 2683 ms
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f5c6f36716e, pid=7713, tid=140034519381760
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.PorterStemFilter.incrementToken()Z
#
[thread 140034523637504 also had an error]
[thread 140034520434432 also had an error]
# An error report file with more information is saved as:
# [thread 140034520434432 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#



However, I can start it and run without any problems by removing
-XX:+AggressiveOpts (which has to be default setting in upcoming releases
Java 6)



Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
http://www-01.ibm.com/support/docview.wss?uid=swg21422605



Thanks,
Fuad Efendi

http://www.tokenizer.ca




Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Robert Muir
Don't use this option, these optimizations are buggy:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134


On Wed, Jul 27, 2011 at 3:56 PM, Fuad Efendi f...@efendi.ca wrote:
 Anyone tried this? I can not start Solr-Tomcat with following options on
 Ubuntu:

 JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -Xmn256m -XX:MaxPermSize=256m
 JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/data/solr -Dfile.encoding=UTF8
 -Duser.timezone=GMT
 -Djava.util.logging.config.file=/data/solr/logging.properties
 -Djava.net.preferIPv4Stack=true
 JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSIncrementalMode  -XX:+AggressiveOpts -XX:NewSize=64m
 -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=77
 -XX:+CMSParallelRemarkEnabled
 JAVA_OPTS=$JAVA_OPTS -verbose:gc  -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps -Xloggc:/data/solr/solr-gc.log


 Tomcat log (something about PorterStemFilter; Solr 3.3.0):

 INFO: Server startup in 2683 ms
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x7f5c6f36716e, pid=7713, tid=140034519381760
 #
 # JRE version: 6.0_26-b03
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
 linux-amd64 compressed oops)
 # Problematic frame:
 # J  org.apache.lucene.analysis.PorterStemFilter.incrementToken()Z
 #
 [thread 140034523637504 also had an error]
 [thread 140034520434432 also had an error]
 # An error report file with more information is saved as:
 # [thread 140034520434432 also had an error]
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #



 However, I can start it and run without any problems by removing
 -XX:+AggressiveOpts (which has to be default setting in upcoming releases
 Java 6)



 Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
 http://www-01.ibm.com/support/docview.wss?uid=swg21422605



 Thanks,
 Fuad Efendi

 http://www.tokenizer.ca






-- 
lucidimagination.com


An idea for an intersection type of filter query

2011-07-27 Thread Shawn Heisey
I've been looking at the slow queries our Solr installation is 
receiving.  They are dominated by queries with a simple q parameter 
(often *:* for all docs) and a VERY complicated fq parameter.  The 
filter query is built by going through a set of rules for the user and 
putting together each rule's query clause separated by OR -- we can't 
easily break it into multiple filters.


In addition to causing queries themselves to run slowly, this causes 
large autowarm times for our filterCache -- my filterCache autowarmCount 
is tiny (4), but it sometimes takes 30 seconds to warm.


I've seen a number of requests here for the ability to have multiple fq 
parameters ORed together.  This is probably possible, but in the 
interests of compatibility between versions, very impractical.  What if 
a new parameter was introduced?  It could be named fqi, for filter query 
intersection.  To figure out the final bitset for multiple fq and fqi 
parameters, it would use this kind of logic:


fq AND fq AND fq AND (fqi OR fqi OR fqi)

This would let us break our filters into manageable pieces that can 
efficiently populate the filterCache, and they would autowarm quickly.


Is the filter design in Solr separated cleanly enough to make this at 
all reasonable?  I'm not a Java developer, so I'd have a tough time 
implementing it myself.  When I have a free moment I will take a look at 
the code anyway.  I'm trying to teach myself Java.


Thanks,
Shawn



Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread François Schiettecatte
I have not seen this mentioned anywhere, but I found a useful 'trick' to 
restart solr without having to restart tomcat. All you need to do is 'touch' 
the solr.xml in the solr.home directory. It can take a few seconds but solr 
will restart and reload any config.

Cheers

François 

On Jul 27, 2011, at 2:56 PM, Alexei Martchenko wrote:

 I believe you're fine with that. Don't need to reindex all solr database.
 
 2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca
 
 Hi,
 
 
 
 We currently have a big index in production. We would like to add 2
 non-required fields to our schema.xml :
 
 
 
 field name=myfield type=boolean indexed=true stored=true
 required=false/
 
 field name=myotherfield type=string indexed=true stored=true
 required=false multiValued=true/
 
 
 
 I made some tests:
 
 
 
 -  I stopped tomcat
 
 -  I changed the schema.xml
 
 -  I started tomcat
 
 
 
 The data was still there and I was able to add new document with theses 2
 fields.
 
 
 
 So far, it looks I won't need to re-index all my data. Am I right ? Do I
 need to re-index all my data or in that case I'm fine ?
 
 
 
 Thank you !
 
 
 
 Charles-André Martin
 
 
 
 
 -- 
 
 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533



Re: An idea for an intersection type of filter query

2011-07-27 Thread Shawn Heisey

On 7/27/2011 2:00 PM, Shawn Heisey wrote:
I've seen a number of requests here for the ability to have multiple 
fq parameters ORed together.  This is probably possible, but in the 
interests of compatibility between versions, very impractical.  What 
if a new parameter was introduced?  It could be named fqi, for filter 
query intersection.  To figure out the final bitset for multiple fq 
and fqi parameters, it would use this kind of logic:


fq AND fq AND fq AND (fqi OR fqi OR fqi)


Thinking about this after I sent it, I realized that I don't mean 
intersection, that's what filter queries already do. :)  I meant union, 
so fqu would be a better parameter name.


Shawn



Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
Thanks Robert!!!

Submitted On 26-JUL-2011 - yesterday.

This option was popular in HbaseŠ


On 11-07-27 3:58 PM, Robert Muir rcm...@gmail.com wrote:

Don't use this option, these optimizations are buggy:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134


On Wed, Jul 27, 2011 at 3:56 PM, Fuad Efendi f...@efendi.ca wrote:
 Anyone tried this? I can not start Solr-Tomcat with following options on
 Ubuntu:

 JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -Xmn256m -XX:MaxPermSize=256m
 JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/data/solr -Dfile.encoding=UTF8
 -Duser.timezone=GMT
 -Djava.util.logging.config.file=/data/solr/logging.properties
 -Djava.net.preferIPv4Stack=true
 JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSIncrementalMode  -XX:+AggressiveOpts -XX:NewSize=64m
 -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=77
 -XX:+CMSParallelRemarkEnabled
 JAVA_OPTS=$JAVA_OPTS -verbose:gc  -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps -Xloggc:/data/solr/solr-gc.log


 Tomcat log (something about PorterStemFilter; Solr 3.3.0):

 INFO: Server startup in 2683 ms
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x7f5c6f36716e, pid=7713, tid=140034519381760
 #
 # JRE version: 6.0_26-b03
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
 linux-amd64 compressed oops)
 # Problematic frame:
 # J  org.apache.lucene.analysis.PorterStemFilter.incrementToken()Z
 #
 [thread 140034523637504 also had an error]
 [thread 140034520434432 also had an error]
 # An error report file with more information is saved as:
 # [thread 140034520434432 also had an error]
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #



 However, I can start it and run without any problems by removing
 -XX:+AggressiveOpts (which has to be default setting in upcoming
releases
 Java 6)



 Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
 http://www-01.ibm.com/support/docview.wss?uid=swg21422605



 Thanks,
 Fuad Efendi

 http://www.tokenizer.ca






-- 
lucidimagination.com




RE: Spellcheck compounded words

2011-07-27 Thread Dyer, James
I could not reproduce the problem even with the two parameters you show below 
added to the Default handler.  I tried using this default handler with 
different queries with correct  incorrect terms.  I made sure it would 
sometimes successfully create collations and other times try to create 
collations but not find any good ones.  In all cases everything worked as 
expected.

I also checked the code to see if possibly it could create an infinite loop 
whereas the queries that run to check a collation's validity were in themselves 
getting spell corrections back.  But this doesn't look like a possibility.  

If you are able to figure anything more out on this yourself, then please post. 
 If this is a real bug, then we ought to get it fixed.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Wednesday, July 27, 2011 9:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

All the talk about logging derailed the thread. So can someone test if adding 

  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries2/str

to the dedault requesthandler in solrconfig.xml using collations causes
system to hang?


O. Klein wrote:
 
 Anyways. I was testing on 3.3 and found that when I added
 spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters
 to the URL there was no problem at all.
 
 Adding 
 
   str name=spellcheck.maxCollations2/str
   str name=spellcheck.maxCollationTries2/str
 
 to the default requestHandler in solrconfig.xml caused request to hang.
 
 Can someone verify if this is a bug?
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3203569.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Robert Muir
On Wed, Jul 27, 2011 at 4:12 PM, Fuad Efendi f...@efendi.ca wrote:
 Thanks Robert!!!

 Submitted On 26-JUL-2011 - yesterday.

 This option was popular in HbaseŠ

Then you should tell them also, not to use it, if they want their loops to work.

-- 
lucidimagination.com


Re: Indexing SharePoint from SolrJ

2011-07-27 Thread Glen Newton
+1

On 7/27/11, Twomey, David david.two...@novartis.com wrote:

 Does anyone have examples of indexing SP content using the Google Connectors
 API and using SolrJ.

 I know Lucid Imagination has a Sharepoint connector and I have used that
 successfully.

 However,  I would like to create a thumbnail image of PDF's and PPT docs and
 add that to my index and I assume I need to use solrJ and some third party
 libraries  to do that.  Hence I want to crawl Sharepoint using SolrJ so I
 can then call third party libraries at index time.


 Thanks so much
 David





-- 
Sent from my mobile device


-


Re: An idea for an intersection type of filter query

2011-07-27 Thread Jonathan Rochkind
I don't know the answer to feasibilty either, but I'll just point out 
that boolean OR corresponds to set union, not set intersection.  
So I think you probably mean a 'union' type of filter query; 
'intersection' does not seem to describe what you are describing; 
ordinary 'fq' values are 'intersected' already to restrict the result 
set, no?


So, anyhow, the basic goal, if I understand it right, is not to provide 
any additional semantics, but to allow individual clauses in an 'fq' 
OR to be cached and looked up in the filter cache individually.


Perhaps someone (not me) who understands the Solr architecture better 
might also have another suggestion for how to get to that goal, other 
than the specific thing you suggested. I do not know, sorry.


Hmm, but I start thinking, what about a general purpose mechanism to 
identify a sub-clause that should be fetched/retrieved from the filter 
cache. I don't _think_ current nested queries will do that:


fq=_query_:foo:bar OR _query_:foo:baz

That's legal now (and doesn't accomplish much) -- but what if the 
individual subquery components could consult the filter cache 
seperately?  I don't know if nested query is the right way to do that or 
not, but I'm thinking some mechanism where you could arbitrarily 
identify clauses that should be filter cached independently?


Jonathan

On 7/27/2011 4:00 PM, Shawn Heisey wrote:
I've been looking at the slow queries our Solr installation is 
receiving.  They are dominated by queries with a simple q parameter 
(often *:* for all docs) and a VERY complicated fq parameter.  The 
filter query is built by going through a set of rules for the user and 
putting together each rule's query clause separated by OR -- we can't 
easily break it into multiple filters.


In addition to causing queries themselves to run slowly, this causes 
large autowarm times for our filterCache -- my filterCache 
autowarmCount is tiny (4), but it sometimes takes 30 seconds to warm.


I've seen a number of requests here for the ability to have multiple 
fq parameters ORed together.  This is probably possible, but in the 
interests of compatibility between versions, very impractical.  What 
if a new parameter was introduced?  It could be named fqi, for filter 
query intersection.  To figure out the final bitset for multiple fq 
and fqi parameters, it would use this kind of logic:


fq AND fq AND fq AND (fqi OR fqi OR fqi)

This would let us break our filters into manageable pieces that can 
efficiently populate the filterCache, and they would autowarm quickly.


Is the filter design in Solr separated cleanly enough to make this at 
all reasonable?  I'm not a Java developer, so I'd have a tough time 
implementing it myself.  When I have a free moment I will take a look 
at the code anyway.  I'm trying to teach myself Java.


Thanks,
Shawn




Re: Speeding up search by combining common sub-filters

2011-07-27 Thread Jonathan Rochkind
I'm pretty sure Solr/lucene have no such optimization already, but 
it's not clear to me that it would result in much of a performance 
benefit, just because of the way lucene works, it's not obvious to me 
that the second version of your query will be noticeably faster than the 
first version.


Maybe in cases with many many clauses, rather than the few clauses in 
your example. You'd definitely want to performance test it to verify 
there are any gains, before embarking on writing the 'optimization' -- 
you can test it just by sending the different versions of your real 
world queries to Solr and seeing what the response times are, 
calculating the hypothetically 'optimized' version yourself by hand if 
need be, right?




On 7/27/2011 5:05 PM, Scott Smith wrote:

We have a solr application which ends up creating queries with very complicated 
filters (literally hundreds and sometimes thousands of terms-typically a large 
number of terms OR'ed together where each of these terms might have a half a 
dozen keywords ANDed/ORed together).  In looking at the filters, I realized 
that there are often a lot of common sub-filters.

A simple example of what I mean is:

 (cat AND dog) OR (cat AND horse)

This could clearly be simplified by saying:

 cat AND (dog OR horse)

It turns out that finding and combining common sub-filters isn't trivial for our 
application.  So, before I start a project to attempt some kind of 
optimization, my question is whether it's likely that I will see significant 
decreases in query times to justify the development effort it takes to optimize the 
filters.  Certainly, if I thought I might get a 20%+ decrease in time, I would say it's 
probably a good project.  If it's just a few percentage points of improvement, then I'm 
less excited about doing it.

Does Solr already go through some kind of optimization which effectively 
combines common sub-filters and possibly duplicated terms?  Does anyone have 
any thoughts on this subject?

Thanks

Scott



Re: An idea for an intersection type of filter query

2011-07-27 Thread Shawn Heisey

On 7/27/2011 3:49 PM, Jonathan Rochkind wrote:
I don't know the answer to feasibilty either, but I'll just point out 
that boolean OR corresponds to set union, not set intersection.  
So I think you probably mean a 'union' type of filter query; 
'intersection' does not seem to describe what you are describing; 
ordinary 'fq' values are 'intersected' already to restrict the result 
set, no?


You're right, I noticed that later and corrected myself.  Substitute fqu 
(and try not to pronounce it) for fqi in my previous message.  This is 
the only name suggestion I could come up with on short notice, and it's 
probably a good idea to change it.


So, anyhow, the basic goal, if I understand it right, is not to 
provide any additional semantics, but to allow individual clauses in 
an 'fq' OR to be cached and looked up in the filter cache individually.


I would like to have both intersection and union at the same time, not 
be restricted to one or the other, and have it be possible without 
altering existing functionality.  The idea is to just add a new 
parameter that just changes how the resulting bitset is applied to the 
query results.  The filterCache entry would look the same whether you 
used fq or fqu.  Restating my suggested bitset logic with the changed 
parameter name:


fq AND fq AND fq AND (fqu OR fqu OR fqu)

It would be awesome to have a syntax that creates arbitrarily complex 
and nested AND/OR combinations, but that would be a MAJOR undertaking.  
The logic I've mentioned above seems to be the most useful you could get 
with just having the one additional parameter.  You can get pure union 
by just using fqu.  The existing model of pure intersection would be 
maintained when only fq is present.


Thanks,
Shawn



Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread Alexei Martchenko
I always run
http://localhost:8983/solr/admin/cores?action=RELOADcore=corename in the
browser when I wanna reload solr and see any changes in config xmls.

2011/7/27 François Schiettecatte fschietteca...@gmail.com

 I have not seen this mentioned anywhere, but I found a useful 'trick' to
 restart solr without having to restart tomcat. All you need to do is 'touch'
 the solr.xml in the solr.home directory. It can take a few seconds but solr
 will restart and reload any config.

 Cheers

 François

 On Jul 27, 2011, at 2:56 PM, Alexei Martchenko wrote:

  I believe you're fine with that. Don't need to reindex all solr database.
 
  2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca
 
  Hi,
 
 
 
  We currently have a big index in production. We would like to add 2
  non-required fields to our schema.xml :
 
 
 
  field name=myfield type=boolean indexed=true stored=true
  required=false/
 
  field name=myotherfield type=string indexed=true stored=true
  required=false multiValued=true/
 
 
 
  I made some tests:
 
 
 
  -  I stopped tomcat
 
  -  I changed the schema.xml
 
  -  I started tomcat
 
 
 
  The data was still there and I was able to add new document with theses
 2
  fields.
 
 
 
  So far, it looks I won't need to re-index all my data. Am I right ? Do I
  need to re-index all my data or in that case I'm fine ?
 
 
 
  Thank you !
 
 
 
  Charles-André Martin
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533




colocated term stats

2011-07-27 Thread Twomey, David

Given a query term, is it possible to get from the index the top 10 collocated 
terms in the index.

ie:  return the top 10 terms that appear with this term based on doc count.

A plus would be to add some constraints on how near the terms are in the docs.





Re: Data Import Handler Architecture Diagram

2011-07-27 Thread Chris Hostetter

: Maybe I am looking at the wrong version - the diagram (and the screenshot in
: the interactive dev mode section)  don't show up in the WIKI page.  
: 
: http://wiki.apache.org/solr/DataImportHandler#Architecture
: 
: Is this a wrong link?

Ugh.

a while back the Infra team disabled attachments in the wiki because of 
spam.  Attachments are all still in the system on disk somewhere, but you 
can't view, edit, or replace them...

https://issues.apache.org/jira/browse/INFRA-3634

...i thought the only major use of attachments on the Solr wiki was the 
eclipse project zip files (which are no available in dev-tools) but i 
didn't realize there were any diagrams as well.

Nothing anyone except Infra can really do about it.

-Hoss


Re: Exact match not the first result returned

2011-07-27 Thread Chris Hostetter

: With your solution, RECORD 1 does appear at the top but I think thats just
: blind luck more than anything else because RECORD 3 shows as having the same
: score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
: like all three records returned with RECORD 1 being the first listing.

with omitNorms RECORD1 and RECORD3 have the same score because only the 
tf() matters, and both docs contain the term frank exactly twice.

the reason RECORD1 isn't scoring higher even though it contains (as you 
put it matchings 'Fred' exactly is that from a term perspective, RECORD1 
doesn't actually match myname:Fred exactly, because there are in fact 
other terms in that field because it's multivalued.

one way to indicate that you (only* want documents where entire field 
values to match your input (ie: RECORD1 but no other records) would be to 
use a StrField instead of a TextField or an analyzer that doesn't split up 
tokens (lie: something using KeywordTokenizer).  that way a query on 
myname:Frank would not match a document where you had indexed the value 
Frank Stalone by a query for myname:Frank Stalone would.

in your case, you don't want *only* the exact field value matches, but you 
want them boosted, so you could do something like copyField myname into 
myname_str and then do...

  q=+myname:Frank myname_str:Frank^100

...in which case a match on myname is required, but a match on 
myname_str will greatly increase the score.

dismax (and edismax) are really designed for situations like this...

  defType=dismax  qf=myname  pf=myname_str^100  q=Frank



-Hoss


Re: Dealing with keyword stuffing

2011-07-27 Thread Chris Hostetter

: Presumably, they are doing this by increasing tf (term frequency),
: i.e., by repeating keywords multiple times. If so, you can use a custom
: similarity class that caps term frequency, and/or ensures that the scoring
: increases less than linearly with tf. Please see

in paticular, using something like SweetSpotSimilarity tuned to know what 
values make sense for good content in your domain can be useful because 
it can actaully penalize docsuments that are too short/long or have term 
freqs that are outside of a reasonble expected range.

FWIW though: that's really just a generic answer to a generic question.  
the better you understand your data, the better you can configure solr for 
it -- and that goes equally for the advice people can give you about how 
to configure solr.  you haven't given any information about hte nature of 
your data: the types of documets, the authoritaive source, the fields 
involved, where/how/when people edit this data, who is keyword spamming, 
etc.; or how you wnat to use it: what types of queries you need to 
support, what your users objectives are, etc.  That makes it impossible 
for anyone to suggest anything but the most general answer customize 
your Similarity.

-Hoss


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS 
group) does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Nigam, Anand, GBM 
Sent: 28 July 2011 08:37
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS 
group) does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


Store complete XML record (DIH XPathEntityProcessor)

2011-07-27 Thread solruser@9913
I am trying to use DIH to import an XML based file with multiple XML records
in it.  Each record corresponds to one document in Lucene.  I am using the
DIH FileListEntityProcessor (to get file list) followed by the
XPathEntityProcessor to create the entities.  

It works perfectly and I am able to map XML elements to fields . however
I also need to store the entire XML record as separate 'full text' field. 
Is there any way the XPathEntityProcessor provides a variable like 'rawLine'
or 'plainText' that I can map to a field.  

I tried to use the Plain Text processor after this  - but that does not
recognize the XML boundaries and just gives the whole XML file.


   entity name=x rootEntity=truedataSource=logfilereader
   processor=XPathEntityProcessor
   url=${logfile.fileAbsolutePath}  stream=false
forEach=/xml/myrecord
   transformer=   
 field column=mycol1 
xpath=/xml/myrecord/@something
/
 
and so on ...
This works perfectly.  However I also need something like ...

field column=fullxmlrecord name=plainText  /

Any help is much appreciated. I am a newbie and may be missing something
obvious here

-g



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3205524.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem starting solr on jetty

2011-07-27 Thread Anand.Nigam
Hi All,

I tried to debug the issue by runing start.jar in eclipse debuger and found 
that the root of the issue was that the jetty.home system property was not set. 
If I set the jetty.home property then the server starts properly.

Thanks,
Anand

-Original Message-
From: Nigam, Anand, GBM 
Sent: 28 July 2011 08:39
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Nigam, Anand, GBM
Sent: 28 July 2011 08:37
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS 
group) does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers appropriate. 

Visit our website at www.rbs.com 

***
  


RE: Problem starting solr on jetty

2011-07-27 Thread Steven A Rowe
Hi Anand,

Congrats!  And thanks for letting us know.

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com] 
Sent: Thursday, July 28, 2011 12:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi All,

I tried to debug the issue by runing start.jar in eclipse debuger and found 
that the root of the issue was that the jetty.home system property was not set. 
If I set the jetty.home property then the server starts properly.

Thanks,
Anand

-Original Message-
From: Nigam, Anand, GBM 
Sent: 28 July 2011 08:39
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Nigam, Anand, GBM
Sent: 28 July 2011 08:37
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

 
Thanks for your reply Steve.

My environment details:

Java version: 1.6.0_24
System: Microsoft Windows  XP Professional Version 2002 Service Pack 3

Interestingly my colleagues who have the same environment are not facing this 
problem.

Thanks  Regards
Anand Nigam

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 27 July 2011 20:21
To: solr-user@lucene.apache.org
Subject: RE: Problem starting solr on jetty

Hi Anand,

Someone else reported this exact same error with Solr v1.4.0: 
http://www.lucidimagination.com/search/document/fd5b83f3595a1c6c/can_t_start_solr_by_java_jar_start_jar

I downloaded the apache-solr-3.3.0.zip, unpacked it, then ran 'java -jar 
start.jar' from the cmdline.  It worked.  (Windows 7; Oracle Java 1.6.0_23).

I tried to reproduce the error you're seeing, by making the example\ directory 
and all its contents read-only (different exception: FileNotFound), and by 
removing the entire contents of the example\ directory except for start.jar 
(nothing happens - it just quits without printing anything out).

Can you give more details about your environment?

Steve

-Original Message-
From: anand.ni...@rbs.com [mailto:anand.ni...@rbs.com]
Sent: Wednesday, July 27, 2011 7:25 AM
To: solr-user@lucene.apache.org
Subject: Problem starting solr on jetty

Hi,

I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:

C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.mortbay.start.Main.init(Main.java:465)
at org.mortbay.start.Main.start(Main.java:439)
at org.mortbay.start.Main.main(Main.java:119)

Could someone help me in resolving this issue.

Thanks  Regards
Anand Nigam


***
The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
Authorised and regulated by the Financial Services Authority. The Royal Bank of 
Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has 
its seat at Amsterdam, the Netherlands, and is registered in the Commercial 
Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, 
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank 
of Scotland plc are authorised to act as agent for each other in certain 
jurisdictions. 
  
This e-mail message is confidential and for use by the addressee only. 
If the message is received by anyone other than the addressee, please return 
the message to the sender by replying to it and then delete the message from 
your computer. Internet e-mails are not necessarily secure. The Royal Bank of 
Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS 
group) does not accept responsibility for changes made to this message after 
it was sent. For the protection of RBS group and its clients and customers, and 
in compliance with regulatory requirements, the contents of both incoming and 
outgoing e-mail communications, which could include proprietary information and 
Non-Public Personal Information, may be read by authorised persons within RBS 
group other than the intended recipient(s). 

Whilst all reasonable care has been taken to avoid the transmission of viruses, 
it is the responsibility of the recipient to ensure that the onward 
transmission, opening or use of this message and any attachments will not 
adversely affect its systems or data. No responsibility is accepted by the RBS 
group in this regard and the recipient should carry out such virus and other 
checks as it considers 

RE: Problem starting solr on jetty

2011-07-27 Thread Chris Hostetter

: I tried to debug the issue by runing start.jar in eclipse debuger and 
: found that the root of the issue was that the jetty.home system property 
: was not set. If I set the jetty.home property then the server starts 
: properly.

H, weird ... that still doesn't really make much sense.

The jetty.home property isn't required by Jetty (or solr). if it's unset, 
it defaults to the current working directory.


: I am new to solr. I have downloaded the solr 3.3.0 distribution and tryign to 
run it using java -jar start.jar from the apache-solr-3.3.0\example directory 
(start.jar is present here). But I am getting following error on running this 
command:
: 
: C:\downloads\apache-solr-3.3.0\apache-solr-3.3.0\examplejava -jar start.jar 
java.lang.NullPointerException
: at java.io.File.init(File.java:222)
: at org.mortbay.start.Main.init(Main.java:465)
: at org.mortbay.start.Main.start(Main.java:439)
: at org.mortbay.start.Main.main(Main.java:119)
: 
: Could someone help me in resolving this issue.


-Hoss