What does this mean?

2011-12-13 Thread Kashif Khan
I have field value cache stats. Anyone can tell me whether this shows any
problem with respect to performance?

name:fieldCache  
class:   org.apache.solr.search.SolrFieldCacheMBean  
version: 1.0  
description: Provides introspection of the Lucene FieldCache, this is
**NOT** a cache that is managed by Solr.  
stats:  entries_count : 4 
entry#0 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800
_5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359
_ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588
_orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009
_ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010
_stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101
_sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101
_stw(2.x):C101)'='2',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#26094814
 
entry#1 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800
_5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359
_ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588
_orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009
_ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010
_stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101
_sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101
_stw(2.x):C101)'='S677',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#19129406
 
entry#2 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800
_5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359
_ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588
_orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009
_ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010
_stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101
_sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101
_stw(2.x):C101)'='648',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#22924041
 
entry#3 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800
_5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359
_ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588
_orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009
_ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010
_stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101
_sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101
_stw(2.x):C101)'='359',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#12376920
 
insanity_count : 0 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-does-this-mean-tp3581884p3581884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Too many connections in CLOSE_WAIT state on master solr server

2011-12-13 Thread samarth s
Hi,

I am using solr replication and am experiencing a lot of connections
in the state CLOSE_WAIT at the master solr server. These disappear
after a while, but till then the master solr stops responding.

There are about 130 open connections on the master server with the
client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
client port specified on the master solr server netstat results is not
visible in the netstat results on the client (slave solr) m/c.

Following is my environment:
- 40 cores in the master solr on m/c 1
- 40 cores in the slave solr on m/c 2
- The replication poll interval is 20 seconds.
- Replication part in solrconfig.xml in the slave solr:
requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave

  !--fully qualified url for the replication handler
of master--
  str name=masterUrl$mastercorename/replication/str

  !--Interval in which the slave should poll master
.Format is HH:mm:ss . If this is absent slave does not poll
automatically.
   But a fetchindex can be triggered from
the admin or the http API--
  str name=pollInterval00:00:20/str
  !-- The following values are used when the slave
connects to the master to download the index files.
  Default values implicitly set as 5000ms
and 1ms respectively. The user DOES NOT need to specify
  these unless the bandwidth is extremely
low or if there is an extremely high latency--
  str name=httpConnTimeout5000/str
  str name=httpReadTimeout1/str
         /lst
  /requestHandler

Thanks for any pointers.

--
Regards,
Samarth


Re: Virtual Memory very high

2011-12-13 Thread Dmitry Kan
If you allow me to chime in, is there a way to check for which
DirectoryFactory is in use, if
${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured?

Dmitry

2011/12/12 Yury Kats yuryk...@yahoo.com

 On 12/11/2011 4:57 AM, Rohit wrote:
  What are the difference in the different DirectoryFactory?


 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html

 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html



Looking for a good commit/merge strategy

2011-12-13 Thread peter_solr
Hi all,

we are indexing real-time documents from various sources. Since we have
multiple sources, we encounter quite a number of duplicates which we delete
from the index. This mostly occurs within a short timeframe; deletes of
older documents may happen, but they do not have a high priority. Search
results do not need to be exactly reatime (they can be 1 minute or so
behind), but facet counts should be correct as we use them to visualize
frequencies in the data. We are now looking for a good commit/merge
strategy. Any advice?

Thanks and best,
Peter

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582294.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Looking for a good commit/merge strategy

2011-12-13 Thread darren
How do you determine a duplicate?

Solr has de-duplication built in and also you may consider hashing
documents on some fields to create a consistent doc id that would be the
same for same documents and let Solr re-write them. Either approach would
reduce or eliminate the possibility of duplicates and save time.


 Hi all,

 we are indexing real-time documents from various sources. Since we have
 multiple sources, we encounter quite a number of duplicates which we
 delete
 from the index. This mostly occurs within a short timeframe; deletes of
 older documents may happen, but they do not have a high priority. Search
 results do not need to be exactly reatime (they can be 1 minute or so
 behind), but facet counts should be correct as we use them to visualize
 frequencies in the data. We are now looking for a good commit/merge
 strategy. Any advice?

 Thanks and best,
 Peter

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582294.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Problem with result grouping

2011-12-13 Thread Kissue Kissue
Hi,

Maybe there is something i am missing here but i have a field in my solr
index called categoryId. The field definition is as follows:

field name=categoryId type=string indexed=true stored=true
required=true /

I am trying to group on this field and i get a result as follows:
str name=groupValue43201810/str
result name=doclist numFound=72 start=0

This is the query i am sending to solr:
http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2start=0rows=1000indent=ongroup=truegroup.field=categoryId

My understanding is that this means there are 72 documents in my index that
have the value 43201810 for categoryId. Now surprisingly when i search my
index specifically for categoryId:43201810 expecting to get 72 results i
instead get 124 results. This is the query sent:
http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810version=2.2start=0rows=10indent=on

Is my understanding of result grouping correct? Is there something i am
doing wrong. Any help will be much appreciated. I am using Solr 3.5

Thanks.


Re: Looking for a good commit/merge strategy

2011-12-13 Thread solr-ra
Peter:

You may want to take a look at Solr 3.4 with RankingAlgorithm 1.3. It has
NRT support that allows you to search in real time with updates. The
performance is about 1 docs / sec with the MBArtists index (approx 43
fields ). MBArtists index is the index of artists from musicbrainz.org in
the Solr 1.4 Enterprise Server book. 

Regards data visibility, you can configure this as a parameter in
solrconfig.xml as below:
   realtime visible=200 facet=falsetrue/realtime

The visible attribute, 200 is in ms and controls the max duration updated
docs may not be visible in a search. The facet attribute, can be true or
false depending on if you need real time faceting. Real time faceting,
depending on update load (for high updates), can see performance problems as
field cache is invalidated. So turn it on as needed. 

You can get more information about the NRT with Solr 3.x and
RankingAlgorithm 1.3 from here:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x

You can download  Solr 3.4 with RankingAlgorithm 1.3 from here:
http://solr-ra.tgels.org

(there is an early access Solr 3.5 with RankingAlgorithm 1.3 release
available for download also)

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Quick relevance question

2011-12-13 Thread Erick Erickson
There's actually a Solr JIRA about this:
https://issues.apache.org/jira/browse/SOLR-2953

But it begs the question of why you want to do this?
Are you sure this would actually providing a better
experience for your users? The reason I ask is
that you could put a lot of effort into making this happen
and discover that it was a waste

IOTW, what's the use case?

Best
Erick

On Sun, Dec 11, 2011 at 5:57 PM, Ryan Gehring rgehr...@linkedin.com wrote:
 Hello!
 SOLR Newbie here. I managed to pick a use case that is a little goofy for my 
 first SOLR voyage and would love help.

 I'd like to rank documents by the total matching term count rather than the 
 usual cosine similarity stuff.

 I have a field named all which has all my searchable fields copyfielded to 
 it.It seems like I need to use SOLR 4.0 and

 _val_:sum( Termfreq(all, 'term1') , Termfreq(all, 'term2') , … )

 For every query term. Is there a better way to do this ?

 Thanks!
 Ryan Gehring



Re: Highlighter highlighting terms which are not part of the search

2011-12-13 Thread Erick Erickson
Well, we need some more details to even guess.
Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Mon, Dec 12, 2011 at 12:04 AM, Shyam Bhaskaran
shyam.bhaska...@synopsys.com wrote:
 Hi

 We recently upgraded our Solr to the latest 4.0 trunk and we are seeing a 
 weird behavior with highlighting which was not seen earlier.

 When a search query for example generate test pattern is passed in the 
 results et obtained the first few results shows the highlighting properly but 
 in the later results we see terms which were not part of the search like 
 Question, Answer, used etc. are being highlighted. We are using regular 
 and termVectorHighlighter and never faced this kind of scenario, edismax is 
 used in our configuration.

 Can someone point to what is causing this problem and where I need to look 
 into for fixing this?

 -Shyam


Re: Ask about the question of solr cache

2011-12-13 Thread Erick Erickson
Are you sure you commit after you're done? If you change
the index, this should all be automatic. Although that doesn't
make a lot of sense if you restart Solr because the
changes would probably be lost then.

But I'm a bit confused about what you mean by caches
not being updated. Do you mean search results?

Some details about how you're verifying that the results
aren't available would be helpful.

This should be all automatic, so my first guess would be
that you're not doing what you think you are G.

Best
Erick

On Mon, Dec 12, 2011 at 6:21 AM, JiaoyanChen chen00...@gmail.com wrote:
 When I have delete or add data by application through solrj, or have
 import index through command nutch solrindex, the cache of solr are not
 changed if I do not restart solr.
 Could anyone tell me how could I update solr cache without restarting
 using shell command?
 When I recreate the index by nutch, I should update data in solr.
 I use java -jar start.jar to publish solr.
 Thanks!



Re: manipulate the results coming back from SOLR? (was: possible to do arithmetic on returned values?)

2011-12-13 Thread Erick Erickson
Erik hatcher wrote you a comment assuming you were
using Velocity. The more generic form of that comment is
that this is an app-level issue by and large. Solr is in charge
of searching and returning data, the app is a better place
to change that into something pretty...

Best
Erick

On Mon, Dec 12, 2011 at 9:37 AM, Gabriel Cooper gabriel.coo...@jtv.com wrote:
 I'm hoping I just got lost in the shuffle due to posting on a Friday night.
 Is there a way to change a field's data via some function, e.g. add,
 subtract, product, etc.?


 On 12/9/11 4:17 PM, Gabriel Cooper wrote:

 Is there a way to manipulate the results coming back from SOLR?

 I have a SOLR 3.5 index that contains values in cents (e.g. 100 in the
 index represents $1.00) and in certain contexts (e.g. CSV export) I'd
 like to divide by 100 for that field to provide a user-friendly in
 dollars number. To do this I played around with Function Queries for a
 while without realizing they're limited to relevancy scores, and later
 found DocTransformers in 4.0 whose description sounded right but don't
 exist in 3.5.

 Is there anything else I haven't considered?

 Thanks for any help

 Gabriel Cooper.




MLT as a nested query

2011-12-13 Thread Vyacheslav Zholudev
Hi, 

is it possible to use MLT as a nested query? I tried the following:
select?q=field1:foo field2:bar AND _query_:{!mlt fl=mltField  mindf=1 
mintf=1 mlt.match.include=false} selectField:baz}

but it does not work with an error:
Unknown query type 'mlt'

I guess I should have an MLT parser enabled in solrconfig.xml, but I was not 
able to find an implementation.

Does anybody have any suggestions?

Vyacheslav



social/collaboration features on top of solr

2011-12-13 Thread Robert Stewart
Has anyone implemented some social/collaboration features on top of SOLR?  What 
I am thinking is ability to add ratings and comments to documents in SOLR and 
then be able to fetch comments and ratings for each document in results (and 
have as part of response from SOLR), similar in fashion to MLT results.  I 
think a separate index or separate core to store collaboration info would be 
needed, as well as a search component for fetching collaboration info for 
results.  I would think this would be a great feature and wondering if anyone 
has done something similar. 

Bob

RE: social/collaboration features on top of solr

2011-12-13 Thread Demian Katz
VuFind (http://vufind.org) uses Solr for library catalog (or similar) 
applications and features a MySQL database which it uses for storing user tags 
and comments outside of Solr itself.  If there were a mechanism more closely 
tied to Solr for achieving this sort of effect, that would allow VuFind to do 
things with considerably more elegance!

- Demian

 -Original Message-
 From: Robert Stewart [mailto:bstewart...@gmail.com]
 Sent: Tuesday, December 13, 2011 10:28 AM
 To: solr-user@lucene.apache.org
 Subject: social/collaboration features on top of solr
 
 Has anyone implemented some social/collaboration features on top of
 SOLR?  What I am thinking is ability to add ratings and comments to
 documents in SOLR and then be able to fetch comments and ratings for
 each document in results (and have as part of response from SOLR),
 similar in fashion to MLT results.  I think a separate index or
 separate core to store collaboration info would be needed, as well as a
 search component for fetching collaboration info for results.  I would
 think this would be a great feature and wondering if anyone has done
 something similar.
 
 Bob


Re: Looking for a good commit/merge strategy

2011-12-13 Thread peter_solr
@ project: Thanks for the hints, I will take a look!

@ Nagendra: Solr-RA seems very interesting! I take it that you can use it
with an existing index?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582626.html
Sent from the Solr - User mailing list archive at Nabble.com.


edismax phrase matching with a non-word char inbetween

2011-12-13 Thread Robert Brown

I have a field which is indexed and queried as follows:

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory synonyms=text-synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/




When searching for street work (with quotes), i'm getting matches 
and highlighting on things like...



...Oxford emStreet/em (emWork/em Experience)...


why is this happening, and what can I do to stop it?

I've set int name=qs0/int in my config to try and avert this 
sort of behaviour, am I correct in thinking that this is used to ensure 
there are no words in-between the phrase words?




Re: Looking for a good commit/merge strategy

2011-12-13 Thread Nagendra Nagarajayya
Yes, no changes to your existing index. No commit needed. You may want 
to change your autocommit interval to about 15 mins ...


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 12/13/2011 7:32 AM, peter_solr wrote:

@ project: Thanks for the hints, I will take a look!

@ Nagendra: Solr-RA seems very interesting! I take it that you can use it
with an existing index?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582626.html
Sent from the Solr - User mailing list archive at Nabble.com.






Matching all documents in the index

2011-12-13 Thread Kissue Kissue
Hi,

I have come across this query in the admin interface: *.*
Is this meant to match all documents in my index?

Currently when i run query with q= *.*, numFound is 130310 but the actuall
number of documents in my index is 603308.
Shen i then run the query with q = *  then numFound is 603308 which is the
total number of documents in my index.

So what is the difference between query with q = *.*  and q = * ?

I ran into this problem because i have a particular scenario where in my
index where i have a field called categoryId which i am grouping on and
another field called orgId which i then filter on. So i do grouping on
categoryId but on all documents in the index matching the filter query
field. I use q = *.* but this dosen't give me the true picture as
highlighted above. So i use q = * and this works fine but takes about
2900ms to execute. Is this efficient? Is there a better way to do something
like this?

Solr version = 3.5

Thanks.


CRUD on solr Index while replicating between master/slave

2011-12-13 Thread Tarun Jain
Hi,
When replication is happening between master to slave what operations can we do 
on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to know 
this anyway. Also I understand that doing DML on a slave will make the slave 
index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-


Re: Matching all documents in the index

2011-12-13 Thread Simon Willnauer
try *:* instead of *.*

simon

On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com wrote:
 Hi,

 I have come across this query in the admin interface: *.*
 Is this meant to match all documents in my index?

 Currently when i run query with q= *.*, numFound is 130310 but the actuall
 number of documents in my index is 603308.
 Shen i then run the query with q = *  then numFound is 603308 which is the
 total number of documents in my index.

 So what is the difference between query with q = *.*  and q = * ?

 I ran into this problem because i have a particular scenario where in my
 index where i have a field called categoryId which i am grouping on and
 another field called orgId which i then filter on. So i do grouping on
 categoryId but on all documents in the index matching the filter query
 field. I use q = *.* but this dosen't give me the true picture as
 highlighted above. So i use q = * and this works fine but takes about
 2900ms to execute. Is this efficient? Is there a better way to do something
 like this?

 Solr version = 3.5

 Thanks.


Best way to convert a field in a query to a fq?

2011-12-13 Thread Andrew Lundgren
I want to modify incoming queries such that a field is always transformed to a 
filter query.  For example, I want to convert a query field like q= ... 
part_page=3 ...  to a filter query like q= ... fq=partpage(3) .

Is the right way to do this in a custom component, or is there someplace else 
where this should be handled?

We have several clients and would like to protect the server from this field 
being queried on even if they make a mistake.

Thank you.

--
Andrew Lundgren
lundg...@familysearch.org


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Matching all documents in the index

2011-12-13 Thread Kissue Kissue
Hi Simon,

Thanks for this. Query time dramatically reduced to 27ms with this.

Many thanks.

On Tue, Dec 13, 2011 at 4:20 PM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 try *:* instead of *.*

 simon

 On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I have come across this query in the admin interface: *.*
  Is this meant to match all documents in my index?
 
  Currently when i run query with q= *.*, numFound is 130310 but the
 actuall
  number of documents in my index is 603308.
  Shen i then run the query with q = *  then numFound is 603308 which is
 the
  total number of documents in my index.
 
  So what is the difference between query with q = *.*  and q = * ?
 
  I ran into this problem because i have a particular scenario where in my
  index where i have a field called categoryId which i am grouping on and
  another field called orgId which i then filter on. So i do grouping on
  categoryId but on all documents in the index matching the filter query
  field. I use q = *.* but this dosen't give me the true picture as
  highlighted above. So i use q = * and this works fine but takes about
  2900ms to execute. Is this efficient? Is there a better way to do
 something
  like this?
 
  Solr version = 3.5
 
  Thanks.



Re: Create 2 index with solr

2011-12-13 Thread pratikmhta
Thank you Dimitry Kan

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-2-index-with-solr-tp2730485p3581568.html
Sent from the Solr - User mailing list archive at Nabble.com.


reposting highlighting questions

2011-12-13 Thread Bent Jensen
I am new to solr/xml/xslt, and trying to figure out how to display search query 
fields highlighted in html. I can enable the highlighting in the query, and I 
think I get the correct xml response back (See below: I search using 'Contents' 
and the highlighting is shown with strong and /strong. However, I cannot 
figure out what to add to the xslt file to transform it in html. I think it is 
a question of defining the appropriate xpath(?), but I am stuck. Can someone 
point me in the right direction? Thanks in advance!

 

Here is the result I get back:

?xml version=1.0 encoding=UTF-8 ? 

 http://us.mg4.mail.yahoo.com/neo/ - response

 http://us.mg4.mail.yahoo.com/neo/ - lst name=responseHeader

  int name=status0/int 

  int name=QTime20/int 

 http://us.mg4.mail.yahoo.com/neo/ - lst name=params

  str name=explainOther / 

  str name=indenton/str 

  str name=hl.simple.pre'strong'/str 

  str name=hl.fl*/str 

  str name=wt / 

  str name=hlon/str 

  str name=rows10/str 

  str name=version2.2/str 

  str name=fl / 

  str name=start0/str 

  str name=qcontents/str 

  str name=hl.simple.post'/strong'/str 

  str name=qt / 

  str name=fq / 

  /lst

  /lst

 http://us.mg4.mail.yahoo.com/neo/ - result name=response numFound=1 
start=0

 http://us.mg4.mail.yahoo.com/neo/ - doc

 http://us.mg4.mail.yahoo.com/neo/ - arr name=content

  strStart with the Table of Contents. See if you can find the topic that you 
are interested in. Look through the section to see if there is a resource that 
can help you. If you find one, you may want to attach a Post-it tab so you can 
find the page later. Write down all of the information that you need to find 
out more information about the resource: agency name, name of contact person, 
telephone number, email and website addresses. If you were unable to find a 
resource that will help you in this resource guide, a good first step would be 
to call your local Independent Living Center. They will have a good idea of 
what is available in your area. A second step would be to call or email us at 
the Rehabilitation Research Center. We have a ROBOT resource specialist who may 
be able to assist. /str

  /arr

 http://us.mg4.mail.yahoo.com/neo/ - arr name=doclink

  strrobot.pdf#page=11/str 

  /arr

  str name=heading1CHAPTER 1: How to Use This Resource Guide/str 

  str name=id1-1/str 

  /doc

  /result

 http://us.mg4.mail.yahoo.com/neo/ - lst name=highlighting

 http://us.mg4.mail.yahoo.com/neo/ - lst name=1-1

 http://us.mg4.mail.yahoo.com/neo/ - arr name=content

  strStart with the Table of 'strong'Contents'/strong'. See if you can 
find the topic that you are interested in. Look/str 

  /arr

  /lst

  /lst

  /response

 

  _  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1873 / Virus Database: 2108/4678 - Release Date: 12/13/11



Combination of edgengram and ngram

2011-12-13 Thread Shawn Heisey
I am interested in a new filter type, one that would combine edgengram 
and ngram.  The idea is that it would create all ngrams specified by the 
min/max size, but the ngrams that happen to be edgengrams (specifically 
the left side) would get an index-time boost.  Optionally the boost 
would be higher if it came from the first token.


The use case:  An automatic autosuggest dropdown that populates as a 
user types into a search box.  The index would have one field and it 
would be built from a manually produced list of suggested search 
phrases.  The boosts mentioned would make it so that matches from the 
beginning of a word, and especially from the beginning of the entire 
suggested phrase, would be returned first.


I could get a similar effect by using a copyfield, analyzing one field 
with ngrams and the other with edgengrams, then using edismax to put a 
boost on the edge version.  I will start with this method, but using 
copyfield makes the index bigger, and using dismax makes the ultimate 
parsed queries more complicated.


If I can avoid the copyfield, the index will be smaller and the queries 
very simple, which should make for very high speed.


I will take a look at the source code, but I'm a bit of a Java novice.  
Does anyone have the knowledge, desire, and time to crank this one out 
quickly?  Is it possible someone has already written such a filter?


Thanks,
Shawn



Re: MySQL data import

2011-12-13 Thread Shawn Heisey

On 12/11/2011 1:54 PM, Brian Lamb wrote:

By nature of my schema, I have several multivalued fields. Each one I
populate with a separate entity. Is there a better way to do it? For
example, could I pull in all the singular data in one sitting and then come
back in later and populate with the multivalued items.

An alternate approach in some cases would be to do a GROUP_CONCAT and then
populate the multivalued column with some transformation. Is that possible?

Lastly, is it possible to use copyField to copy three regular fields into
one multiValued field and have all the data show up?


The best way to proceed may depend on whether you actually need the 
field to be multivalued (returning an array in search results), or if 
you simply need to be able to search on all the values.  For me, it's 
the latter - the field isn't stored.


I use the GROUP_CONCAT method (hidden in a database view, so Solr 
doesn't need to know about it) to put multiple values into a field, 
separated by semicolons.  I then use the following single-valued 
fieldType to split those up and make all the values searchable.  The 
tokenizer splits by semicolons followed by zero or more spaces, the 
pattern filter strips leading and trailing punctuation from each token.  
The ICU filter is basically a better implementation of the ascii folding 
filter and the lowercase filter, in a single pass.  The others are 
fairly self-explanatory:


!-- lowercases, tokenize by semicolons --
fieldType name=lcsemi class=solr.TextField sortMissingLast=true 
positionIncrementGap=0 omitNorms=true

analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=; */
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
  allowempty=false
/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

If you actually do need the field to be multivalued, then you'll need to 
do dataimport transformation as mentioned by Gora, who also replied.


Thanks,
Shawn



RE: Virtual Memory very high

2011-12-13 Thread Rohit
Thanks Yurykats. 

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 13 December 2011 11:17
To: solr-user@lucene.apache.org
Subject: Re: Virtual Memory very high

If you allow me to chime in, is there a way to check for which
DirectoryFactory is in use, if
${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured?

Dmitry

2011/12/12 Yury Kats yuryk...@yahoo.com

 On 12/11/2011 4:57 AM, Rohit wrote:
  What are the difference in the different DirectoryFactory?


 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html

 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html




How to get SolrServer

2011-12-13 Thread Joey
Hi I am new to Solr and want to do some customize development. 
I have wrapped solr to my own web application, and want to write a servlet
to index a file system. 

The question is how can I get SolrServer inside my Servlet?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Matching all documents in the index

2011-12-13 Thread Chris Hostetter

: Thanks for this. Query time dramatically reduced to 27ms with this.

to understand what is going on, use debugQuery=true with each of those 
examples and look at the query toString info.

*:* is the one and only true syntax (in any solr QParser that i know 
of) for find all docs efficiently.

q=*
or  q=*.*

are (if i'm not mistaken) a prefix query and a wildcard query 
(respectively) against the defualt search field 

(* is a prefix query matching all terms that start with the empty 
string -- so all terms.  which means it should matchany doc that has 
any term in the default search field -- which in many indexes 
will be all docs, but is a very inefficient and easily falible way to try 
and match all docs)


-Hoss


RE: Problem with result grouping

2011-12-13 Thread Young, Cody
There's another discussion going on about this in the solr users mail
list, but I think you're using *.* instead of *:* to match all
documents. *.* ends up doing a search against the default field where
*:* means match all documents.

Cody

-Original Message-
From: Kissue Kissue [mailto:kissue...@gmail.com] 
Sent: Tuesday, December 13, 2011 6:06 AM
To: solr-user@lucene.apache.org
Subject: Problem with result grouping

Hi,

Maybe there is something i am missing here but i have a field in my solr
index called categoryId. The field definition is as follows:

field name=categoryId type=string indexed=true stored=true
required=true /

I am trying to group on this field and i get a result as follows:
str name=groupValue43201810/str
result name=doclist numFound=72 start=0

This is the query i am sending to solr:
http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2sta
rt=0rows=1000indent=ongroup=truegroup.field=categoryId

My understanding is that this means there are 72 documents in my index
that have the value 43201810 for categoryId. Now surprisingly when i
search my index specifically for categoryId:43201810 expecting to get 72
results i instead get 124 results. This is the query sent:
http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810ver
sion=2.2start=0rows=10indent=on

Is my understanding of result grouping correct? Is there something i am
doing wrong. Any help will be much appreciated. I am using Solr 3.5

Thanks.


Re: highlighting questions

2011-12-13 Thread Erick Erickson
How to get *what* to display in HTML? The VelocityResponseWriter?
Extracting this content to show in your webapp? How are you displaying
any page at all? You can look at the examples in the VelocityResponseWriter
to get an idea of how to do this with that templating engine.

But the general idea here is that whatever parses your response has to
match the id field in the doc tag with the proper element from
the lstname=highlighting element and mix-n-match them.

Hope that helps
Erick

On Mon, Dec 12, 2011 at 8:03 PM, Bent Jensen bentjen...@yahoo.com wrote:


 I am trying to figure out how to display search query fields highlighted in 
 html. I can enable the highlighting in the query, and I think I get the 
 correct response back (See below: I search using 'Contents' and the 
 highlighting is shown with strong and /strong. However, I can't figure 
 out what to add to the xslt file to display in html. I think it is a question 
 of defining the appropriate xpath(?), but I am stuck. Can someone point me in 
 the right direction? Thanks in advance!


 Here is the result I get back:
 ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime20/int
 - lstname=params
   str name=explainOther/
   strname=indenton/str
   strname=hl.simple.pre'strong'/str
   strname=hl.fl*/str
   str name=wt/
   strname=hlon/str
   strname=rows10/str
   strname=version2.2/str
   str name=fl/
   strname=start0/str
   strname=qcontents/str
   strname=hl.simple.post'/strong'/str
   str name=qt/
   str name=fq/
   /lst
   /lst
 - resultname=responsenumFound=1start=0
 - doc
 - arrname=content
   strStart with the Table of Contents. See if you can find the topic that 
 you are interested in. Look through the section to see if there is a resource 
 that can help you. If you find one, you may want to attach a Post-it tab so 
 you can find the page later. Write down all of the information that you need 
 to find out more information about the resource: agency name, name of contact 
 person, telephone number, email and website addresses. If you were unable to 
 find a resource that will help you in this resource guide, a good first step 
 would be to call your local Independent Living Center. They will have a good 
 idea of what is available in your area. A second step would be to call or 
 email us at the Rehabilitation Research Center. We have a ROBOT resource 
 specialist who may be able to assist. You can reach Lois Roberts, the “Back 
 On Track …To Success” Mentoring Program Assistant, at 408-793-6426 or email 
 her at lois.robe...@hhs.sccgov.org/str
   /arr
 - arrname=doclink
   strrobot.pdf#page=11/str
   /arr
   strname=heading1CHAPTER 1: How to Use This Resource Guide/str
   strname=id1-1/str
   /doc
   /result
 - lstname=highlighting
 - lstname=1-1
 - arrname=content
   strStart with the Table of 'strong'Contents'/strong'. See if you can 
 find the topic that you are interested in. Look/str
   /arr
   /lst
   /lst
   /response


solr ignore duplicate documents

2011-12-13 Thread Alexander Aristov
People,

I am asking for your help with solr.

When a document is sent to solr and such document already exists in its
index (by its ID) then the new doc replaces the old one.

But I don't want to automatically replace documents. Just ignore and
proceed to the next. How can I configure solr to do so?

Of course I can query solr to check if it has the document already but it's
bad for me since I do bulk updates and this will complicate the process and
increase amount of request.

So are there any ways to configure solr to ignore duplicates? Just ignore.
I don't need any specific responses or actions.

Best Regards
Alexander Aristov


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Anybody could help?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Patrick Plaatje
Have à look here first and you're will probably be using SolrEmbeddedServer.

http://wiki.apache.org/solr/Solrj

Patrick


Op 13 dec. 2011 om 20:38 heeft Joey vanjo...@gmail.com het volgende 
geschreven:

 Anybody could help?
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Thanks Patrick  for the reply. 

What I did was un-jar solr.war and created my own web application. Now I
want to write my own servlet to index all files inside a folder. 

I suppose there is already solrserver instance initialized when my web app
started. 

How can I access that solr server instance in my servlet?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to convert a field in a query to a fq?

2011-12-13 Thread Otis Gospodnetic
Hi,

We've done similar query rewriting in a custom SearchComponent that runs before 
QueryComponent.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



 From: Andrew Lundgren lundg...@familysearch.org
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Tuesday, December 13, 2011 11:58 AM
Subject: Best way to convert a field in a query to a fq?
 
I want to modify incoming queries such that a field is always transformed to a 
filter query.  For example, I want to convert a query field like q= ... 
part_page=3 ...  to a filter query like q= ... fq=partpage(3) .

Is the right way to do this in a custom component, or is there someplace else 
where this should be handled?

We have several clients and would like to protect the server from this field 
being queried on even if they make a mistake.

Thank you.

--
Andrew Lundgren
lundg...@familysearch.org


NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.






Re: CRUD on solr Index while replicating between master/slave

2011-12-13 Thread Otis Gospodnetic
Hi,

Master: Update/insert/delete docs    --    Yes
Slaves: Search                              --   Yes

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave
 
Hi,
When replication is happening between master to slave what operations can we 
do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to know 
this anyway. Also I understand that doing DML on a slave will make the slave 
index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-




Re: How to get SolrServer within my own servlet

2011-12-13 Thread Patrick Plaatje
Hey Joey,

You should first configure your deployed Solr instance by adding/changing the 
schema.xml and solrconfig.xml. After that you can use SolrJ to connect to that 
Solr instance and add documents to it. On the link i posted earlier, you'll 
find à couple of examples on how to do that.

- Patrick 

Verstuurd vanaf mijn iPhone

Op 13 dec. 2011 om 20:53 heeft Joey vanjo...@gmail.com het volgende 
geschreven:

 Thanks Patrick  for the reply. 
 
 What I did was un-jar solr.war and created my own web application. Now I
 want to write my own servlet to index all files inside a folder. 
 
 I suppose there is already solrserver instance initialized when my web app
 started. 
 
 How can I access that solr server instance in my servlet?
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: Best way to convert a field in a query to a fq?

2011-12-13 Thread Andrew Lundgren
Thanks for the confirmation!

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Tuesday, December 13, 2011 1:02 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Best way to convert a field in a query to a fq?
 
 Hi,
 
 We've done similar query rewriting in a custom SearchComponent that
 runs before QueryComponent.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  From: Andrew Lundgren lundg...@familysearch.org
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, December 13, 2011 11:58 AM
 Subject: Best way to convert a field in a query to a fq?
 
 I want to modify incoming queries such that a field is always
 transformed to a filter query.  For example, I want to convert a query
 field like q= ... part_page=3 ...  to a filter query like q= ...
 fq=partpage(3) .
 
 Is the right way to do this in a custom component, or is there
 someplace else where this should be handled?
 
 We have several clients and would like to protect the server from this
 field being queried on even if they make a mistake.
 
 Thank you.
 
 --
 Andrew Lundgren
 lundg...@familysearch.org
 
 
 NOTICE: This email message is for the sole use of the intended
 recipient(s) and may contain confidential and privileged information.
 Any unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by
 reply email and destroy all copies of the original message.
 
 
 
 
 


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: How to get SolrServer within my own servlet

2011-12-13 Thread Mikhail Khludnev
The first drawback of SolrJ is using xml serialization for in-process
communication.
I guess you can start from SolrDispatchFilter source, and get something for
your servlet from there.

Regards

On Tue, Dec 13, 2011 at 11:45 PM, Patrick Plaatje pplaa...@gmail.comwrote:

 Have à look here first and you're will probably be using
 SolrEmbeddedServer.

 http://wiki.apache.org/solr/Solrj

 Patrick


 Op 13 dec. 2011 om 20:38 heeft Joey vanjo...@gmail.com het volgende
 geschreven:

  Anybody could help?
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: solr ignore duplicate documents

2011-12-13 Thread Mikhail Khludnev
Man,

Does overwrite=false work for you?
 http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents

Regards

On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov 
alexander.aris...@gmail.com wrote:

 People,

 I am asking for your help with solr.

 When a document is sent to solr and such document already exists in its
 index (by its ID) then the new doc replaces the old one.

 But I don't want to automatically replace documents. Just ignore and
 proceed to the next. How can I configure solr to do so?

 Of course I can query solr to check if it has the document already but it's
 bad for me since I do bulk updates and this will complicate the process and
 increase amount of request.

 So are there any ways to configure solr to ignore duplicates? Just ignore.
 I don't need any specific responses or actions.

 Best Regards
 Alexander Aristov




-- 
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Too many connections in CLOSE_WAIT state on master solr server

2011-12-13 Thread Mikhail Khludnev
You can try to reuse your connections (prevent them from closing) by
specifying  
-Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN
in jvm startup params. At client JVM!. Number should be chosen considering
the number of connection you'd like to keep alive.

Let me know if it works for you.

On Tue, Dec 13, 2011 at 2:57 PM, samarth s samarth.s.seksa...@gmail.comwrote:

 Hi,

 I am using solr replication and am experiencing a lot of connections
 in the state CLOSE_WAIT at the master solr server. These disappear
 after a while, but till then the master solr stops responding.

 There are about 130 open connections on the master server with the
 client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
 client port specified on the master solr server netstat results is not
 visible in the netstat results on the client (slave solr) m/c.

 Following is my environment:
 - 40 cores in the master solr on m/c 1
 - 40 cores in the slave solr on m/c 2
 - The replication poll interval is 20 seconds.
 - Replication part in solrconfig.xml in the slave solr:
 requestHandler name=/replication class=solr.ReplicationHandler 
   lst name=slave

   !--fully qualified url for the replication handler
 of master--
   str name=masterUrl$mastercorename/replication/str

   !--Interval in which the slave should poll master
 .Format is HH:mm:ss . If this is absent slave does not poll
 automatically.
But a fetchindex can be triggered from
 the admin or the http API--
   str name=pollInterval00:00:20/str
   !-- The following values are used when the slave
 connects to the master to download the index files.
   Default values implicitly set as 5000ms
 and 1ms respectively. The user DOES NOT need to specify
   these unless the bandwidth is extremely
 low or if there is an extremely high latency--
   str name=httpConnTimeout5000/str
   str name=httpReadTimeout1/str
  /lst
   /requestHandler

 Thanks for any pointers.

 --
 Regards,
 Samarth




-- 
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: How to get SolrServer

2011-12-13 Thread Schmidt Jeff
Joey:

I'm not sure what you mean by wapping solr to your own web application.  There 
is a way to embed Solr into your application (same JVM), but I've never used 
that.  If you're talking about your servlet running in one JVM and Solr in 
another, then use the SolrJ client library to interact with Solr.  I use 
CommonsHttpSolrServer 
(http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html)
 and specify the URL that locates the Solr server/core name.

I use Spring to instantiate the server instance, and then I inject it where I 
need it.

bean id=solrServerIngContent 
class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
constructor-arg value=http://localhost:8091/solr/mycorename/
/bean

Thus is equivalent to new 
CommonsHttpSolrServer(http://localhost:8091/solr/mycorename;);

Check out the API link above and http://wiki.apache.org/solr/Solrj to examples 
on using the SolrJ API.

Cheers,

Jeff

On Dec 13, 2011, at 12:12 PM, Joey wrote:

 Hi I am new to Solr and want to do some customize development. 
 I have wrapped solr to my own web application, and want to write a servlet
 to index a file system. 
 
 The question is how can I get SolrServer inside my Servlet?
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: CRUD on solr Index while replicating between master/slave

2011-12-13 Thread Tarun Jain
Hi,
Thanks.
So just to clarify here again while replicating we cannot search on master 
index ?

Tarun Jain
-=-



- Original Message -
From: Otis Gospodnetic otis_gospodne...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Cc: 
Sent: Tuesday, December 13, 2011 3:03 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

Hi,

Master: Update/insert/delete docs    --    Yes
Slaves: Search                              --   Yes

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave
 
Hi,
When replication is happening between master to slave what operations can we 
do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to know 
this anyway. Also I understand that doing DML on a slave will make the slave 
index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-





Re: solr ignore duplicate documents

2011-12-13 Thread Erick Erickson
You're probably talking a custom update handler here. That
way you can do a document ID lookup, that is just see if the
incoming document ID is in the index already and throw
the document away if you find one. This should be very
efficient, much more efficient than making a separate query
for each one.

There's no way that I know of to do this out of the box in Solr though.

Best
Erick

On Tue, Dec 13, 2011 at 3:44 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Man,

 Does overwrite=false work for you?
  http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents

 Regards

 On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov 
 alexander.aris...@gmail.com wrote:

 People,

 I am asking for your help with solr.

 When a document is sent to solr and such document already exists in its
 index (by its ID) then the new doc replaces the old one.

 But I don't want to automatically replace documents. Just ignore and
 proceed to the next. How can I configure solr to do so?

 Of course I can query solr to check if it has the document already but it's
 bad for me since I do bulk updates and this will complicate the process and
 increase amount of request.

 So are there any ways to configure solr to ignore duplicates? Just ignore.
 I don't need any specific responses or actions.

 Best Regards
 Alexander Aristov




 --
 Sincerely yours
 Mikhail Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: Too many connections in CLOSE_WAIT state on master solr server

2011-12-13 Thread Erick Erickson
Replicating 40 cores every 20 seconds is just *asking* for trouble.
How often do your cores change on the master? How big are
they? Is there any chance you just have too many cores replicating
at once?

Best
Erick

On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 You can try to reuse your connections (prevent them from closing) by
 specifying  
 -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN
 in jvm startup params. At client JVM!. Number should be chosen considering
 the number of connection you'd like to keep alive.

 Let me know if it works for you.

 On Tue, Dec 13, 2011 at 2:57 PM, samarth s 
 samarth.s.seksa...@gmail.comwrote:

 Hi,

 I am using solr replication and am experiencing a lot of connections
 in the state CLOSE_WAIT at the master solr server. These disappear
 after a while, but till then the master solr stops responding.

 There are about 130 open connections on the master server with the
 client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
 client port specified on the master solr server netstat results is not
 visible in the netstat results on the client (slave solr) m/c.

 Following is my environment:
 - 40 cores in the master solr on m/c 1
 - 40 cores in the slave solr on m/c 2
 - The replication poll interval is 20 seconds.
 - Replication part in solrconfig.xml in the slave solr:
 requestHandler name=/replication class=solr.ReplicationHandler 
           lst name=slave

                   !--fully qualified url for the replication handler
 of master--
                   str name=masterUrl$mastercorename/replication/str

                   !--Interval in which the slave should poll master
 .Format is HH:mm:ss . If this is absent slave does not poll
 automatically.
                                But a fetchindex can be triggered from
 the admin or the http API--
                   str name=pollInterval00:00:20/str
                   !-- The following values are used when the slave
 connects to the master to download the index files.
                               Default values implicitly set as 5000ms
 and 1ms respectively. The user DOES NOT need to specify
                               these unless the bandwidth is extremely
 low or if there is an extremely high latency--
                   str name=httpConnTimeout5000/str
                   str name=httpReadTimeout1/str
          /lst
   /requestHandler

 Thanks for any pointers.

 --
 Regards,
 Samarth




 --
 Sincerely yours
 Mikhail Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: CRUD on solr Index while replicating between master/slave

2011-12-13 Thread Erick Erickson
No, you can search on the master when replicating, no
problem.

But why do you want to? The whole point of master/slave
setups is to separate indexing from searching machines.

Best
Erick

On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote:
 Hi,
 Thanks.
 So just to clarify here again while replicating we cannot search on master 
 index ?

 Tarun Jain
 -=-



 - Original Message -
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc:
 Sent: Tuesday, December 13, 2011 3:03 PM
 Subject: Re: CRUD on solr Index while replicating between master/slave

 Hi,

 Master: Update/insert/delete docs    --    Yes
 Slaves: Search                              --   Yes

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave

Hi,
When replication is happening between master to slave what operations can we 
do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to know 
this anyway. Also I understand that doing DML on a slave will make the slave 
index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-





Migrate Lucene 2.9 To SOLR

2011-12-13 Thread Anderson vasconcelos
Hi

I have a old project that use Lucene 2.9. Its possible to use the index
created by lucene in SOLR? May i just copy de index to data directory of
SOLR, or exists some mechanism to import Lucene index?

Thanks


Re: How to get SolrServer within my own servlet

2011-12-13 Thread Joey
Thank you guys for the reply.

So what I want to do is to modify Solr a bit - add one servlet so I can
trigger a full index of a folder in the file system.

What I did:
   un-jar solr.war;
   Create a web app and copy the un-jar the solr files to this app;
   Create my servlet;
   Repackage the web app to a war and deploy;

by following the suggestions of you guys;
I create a EmbeddedSolrServer in my Servlet:
public void init() throws ServletException {
CoreContainer.Initializer initializer = new 
CoreContainer.Initializer();
CoreContainer coreContainer = null;
try {
coreContainer = initializer.initialize();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
_solrServer = new EmbeddedSolrServer(coreContainer, );
}

And I can now trigger the index by call:
http://localhost:8080/testservlet01. The servlet does this:
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( id, id1, 1.0f );
doc1.addField( name, doc1, 1.0f );

CollectionSolrInputDocument docs = new
ArrayListSolrInputDocument();
docs.add( doc1 );   
try {
_solrServer.add( docs );
_solrServer.commit();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

However, seems the search didn't return unless I restart my application:
localhost:8080/select/?q=*%3A*version=2.2start=0rows=10indent=on


I guess there are two SolrServer instances(one is EmbeddedSolrServer,
created by myself and the other is come with Solr itself and they are
holding different index?

How can I make them synchronized?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583741.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ask about the question of solr cache

2011-12-13 Thread Samuel García Martínez
Is it possible that your Solr client (or the way you communicate with it)
is aware of HTTP caching?

If you are using a navigator in order to confirm these updates and commits,
try disabling HTTP caching.

On Tue, Dec 13, 2011 at 3:24 PM, Erick Erickson erickerick...@gmail.comwrote:

 Are you sure you commit after you're done? If you change
 the index, this should all be automatic. Although that doesn't
 make a lot of sense if you restart Solr because the
 changes would probably be lost then.

 But I'm a bit confused about what you mean by caches
 not being updated. Do you mean search results?

 Some details about how you're verifying that the results
 aren't available would be helpful.

 This should be all automatic, so my first guess would be
 that you're not doing what you think you are G.

 Best
 Erick

 On Mon, Dec 12, 2011 at 6:21 AM, JiaoyanChen chen00...@gmail.com wrote:
  When I have delete or add data by application through solrj, or have
  import index through command nutch solrindex, the cache of solr are not
  changed if I do not restart solr.
  Could anyone tell me how could I update solr cache without restarting
  using shell command?
  When I recreate the index by nutch, I should update data in solr.
  I use java -jar start.jar to publish solr.
  Thanks!
 




-- 
Un saludo,
Samuel García.


Re: Migrate Lucene 2.9 To SOLR

2011-12-13 Thread Robert Stewart
I am about to try exact same thing, running SOLR on top of Lucene indexes 
created by Lucene.Net 2.9.2.  AFAIK, it should work.  Not sure if indexes 
become non-backwards compatible once any new documents are written to them by 
SOLR though.  Probably good to make a backup first.

On Dec 13, 2011, at 4:34 PM, Anderson vasconcelos wrote:

 Hi
 
 I have a old project that use Lucene 2.9. Its possible to use the index
 created by lucene in SOLR? May i just copy de index to data directory of
 SOLR, or exists some mechanism to import Lucene index?
 
 Thanks



Re: Suggest component

2011-12-13 Thread kmf
I think I may have solved my problem.  Not 100% certain what the solution was
because I've been trying so many things, but in the end what I did was
revisit this article and re-step my configuration.

http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/

I believe what the problem was, was the fact that I didn't create a
firstSearcher to ensure that the dictionary was built on start/restart.

Hopefully that helps anyone else who may run across this issue too.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggest-component-tp2725438p3583831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Virtual Memory very high

2011-12-13 Thread Yury Kats
On 12/13/2011 6:16 AM, Dmitry Kan wrote:
 If you allow me to chime in, is there a way to check for which
 DirectoryFactory is in use, if
 ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured?

I think you can get the currently used factory in a Luke response, if you hit 
your Solr server with a Luke request,
eg http://localhost:8983/solr/admin/luke


 
 Dmitry
 
 2011/12/12 Yury Kats yuryk...@yahoo.com
 
 On 12/11/2011 4:57 AM, Rohit wrote:
 What are the difference in the different DirectoryFactory?


 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html

 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html

 



Re: Difference between field collapsing and result grouping

2011-12-13 Thread Chris Hostetter

: Nope, they're the same. The original name was Field Collapsing,
: but it was changed to Grouping later.

Specificly: Field Collapsing is one type of usecase for the more general 
concept of Result Grouping (other types of result grouping are group by 
query, group by function results, etc...)

: But note that the functionality has changed over time, so you might
: be seeing documents from different incarnations of the code.

right ... there was a widely used patch that had param names that were 
very specific to the idea field collapsing that evolved heavily, hence 
the diff param names.


-Hoss


Re: Maximum File Size Handled by post.jar / Speed of Deletes?

2011-12-13 Thread Chris Hostetter

: We would like to know is there a maximum size of a xml file that can be
: posted to Solr using the post.jar, maximum number of docs, etc. at one time
: as well as how fast deletes can be achieved.

post.jar is provided purely as an extremeley trivial tool for 
beginers to use to manual post arbitrary files to Solr while reading 
the tutorial and first getting started with Solr.

It is not intended to be a production tool for use in production systems
for doing automated and/or programatic loading of data.  (if curl 
existed ubiquitiously on every computer in the world capable of running 
java, post.jar would never have been created)

If you want to have custom tools that talk to solr for loading/deleting 
docs, use whatever Solr client or HTTP library you are comfortable with in 
whatever programming language you are already using.  If you are using 
Java please look at the SolrJ client library.



-Hoss


RE: MoreLikeThis questions

2011-12-13 Thread Chris Hostetter


: I'm implementing a MoreLikeThis search.  I have a couple of questions.  
: I'm implementing this with solrj so I would appreciate it if any code 
: snippets reflect that.
: 
: First, I want to provide the text that Solr should check for 
: interesting words and do the search on.  This means I don't want to 
: specify a document in the collection.  I think the documentation implies 

You'll want to make sure you're using the MLT *Handler* (not the MLT 
Component) to specify text in that way.  The key phrase you'll want to pay 
attention to is Content Stream

ContentStream is the general abstraction for streaming data to an solr 
RequestHandler, either via raw HTTP POST, or HTTP multi-part mime, or by 
asking Solr to pull from a remote URL or local file...

https://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams
https://wiki.apache.org/solr/ContentStream

...the ContentStreamUpdateRequest class is somewhat poorly named because 
there is nothign about it that requires you use it for an update 
requests (that's just the primary use of Content Streams), you should 
certinly be able to use it to stream content from your SolrJ client to the 
MLT Handler...

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html


-Hoss


Re: Reducing heap space consumption for large dictionaries?

2011-12-13 Thread Maciej Lisiewski

W dniu 2011-12-13 05:48, Chris Male pisze:

Hi,

Its good to hear some feedback on using the Hunspell dictionaries.
  Lucene's support is pretty new so we're obviously looking to improve it.
  Could you open a JIRA issue so we can explore whether there is some ways
to reduce memory consumption?


Done:
https://issues.apache.org/jira/browse/SOLR-2968


--
Maciej Lisiewski


Re: Too many connections in CLOSE_WAIT state on master solr server

2011-12-13 Thread samarth s
The updates to the master are user driven, and are needed to be
visible quickly. Hence, the high frequency of replication. It may be
that too many replication requests are being handled at a time, but
why should that result in half closed connections?

On Wed, Dec 14, 2011 at 2:47 AM, Erick Erickson erickerick...@gmail.com wrote:
 Replicating 40 cores every 20 seconds is just *asking* for trouble.
 How often do your cores change on the master? How big are
 they? Is there any chance you just have too many cores replicating
 at once?

 Best
 Erick

 On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
 You can try to reuse your connections (prevent them from closing) by
 specifying  
 -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN
 in jvm startup params. At client JVM!. Number should be chosen considering
 the number of connection you'd like to keep alive.

 Let me know if it works for you.

 On Tue, Dec 13, 2011 at 2:57 PM, samarth s 
 samarth.s.seksa...@gmail.comwrote:

 Hi,

 I am using solr replication and am experiencing a lot of connections
 in the state CLOSE_WAIT at the master solr server. These disappear
 after a while, but till then the master solr stops responding.

 There are about 130 open connections on the master server with the
 client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
 client port specified on the master solr server netstat results is not
 visible in the netstat results on the client (slave solr) m/c.

 Following is my environment:
 - 40 cores in the master solr on m/c 1
 - 40 cores in the slave solr on m/c 2
 - The replication poll interval is 20 seconds.
 - Replication part in solrconfig.xml in the slave solr:
 requestHandler name=/replication class=solr.ReplicationHandler 
           lst name=slave

                   !--fully qualified url for the replication handler
 of master--
                   str name=masterUrl$mastercorename/replication/str

                   !--Interval in which the slave should poll master
 .Format is HH:mm:ss . If this is absent slave does not poll
 automatically.
                                But a fetchindex can be triggered from
 the admin or the http API--
                   str name=pollInterval00:00:20/str
                   !-- The following values are used when the slave
 connects to the master to download the index files.
                               Default values implicitly set as 5000ms
 and 1ms respectively. The user DOES NOT need to specify
                               these unless the bandwidth is extremely
 low or if there is an extremely high latency--
                   str name=httpConnTimeout5000/str
                   str name=httpReadTimeout1/str
          /lst
   /requestHandler

 Thanks for any pointers.

 --
 Regards,
 Samarth




 --
 Sincerely yours
 Mikhail Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com



-- 
Regards,
Samarth


Re: How to get SolrServer within my own servlet

2011-12-13 Thread yunfei wu
Just curious, sounds like you try to deploy your servlet with Solr support,
why don't you just deploy you app as separate Sevlet with the Solr war in
the server, then let your servlet send requests to Solr? This will bring
much benefit in maintaining your app with Solr support.

Yunfei


On Tuesday, December 13, 2011, Joey vanjo...@gmail.com wrote:
 Thank you guys for the reply.

 So what I want to do is to modify Solr a bit - add one servlet so I can
 trigger a full index of a folder in the file system.

 What I did:
   un-jar solr.war;
   Create a web app and copy the un-jar the solr files to this app;
   Create my servlet;
   Repackage the web app to a war and deploy;

 by following the suggestions of you guys;
 I create a EmbeddedSolrServer in my Servlet:
public void init() throws ServletException {
CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
CoreContainer coreContainer = null;
try {
coreContainer = initializer.initialize();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
_solrServer = new EmbeddedSolrServer(coreContainer, );
}

 And I can now trigger the index by call:
 http://localhost:8080/testservlet01. The servlet does this:
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( id, id1, 1.0f );
doc1.addField( name, doc1, 1.0f );

CollectionSolrInputDocument docs = new
 ArrayListSolrInputDocument();
docs.add( doc1 );
try {
_solrServer.add( docs );
_solrServer.commit();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

 However, seems the search didn't return unless I restart my application:
 localhost:8080/select/?q=*%3A*version=2.2start=0rows=10indent=on


 I guess there are two SolrServer instances(one is EmbeddedSolrServer,
 created by myself and the other is come with Solr itself and they are
 holding different index?

 How can I make them synchronized?

 --
 View this message in context:
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583741.html
 Sent from the Solr - User mailing list archive at Nabble.com.