Re: Data from deleted from Solr (Solr cloud)

2012-12-20 Thread John Nielsen
Yeah, I ran into this issue myself with solr-4.0.0.

To fix it, I had to compile my own version from the solr-4x branch. That
is, I assume it's fixed as I have been unable to replicate it after the
switch.

I'm afraid you will have to reindex your data.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


On Wed, Dec 19, 2012 at 5:08 PM, shreejay shreej...@gmail.com wrote:

 Hi All,

 I have a solrlcoud instance with 3 shards. Each shard has 2 instance (2
 servers each running a instance of solr)

 Lets say I had Instance1 and instance2 in shard1 … At some point, instance2
 went down due to OOM (out of memory) . instance1 for some reason was not
 replicating the data properly and when it became the leader, it had only
 around 1% of the data that instance2 had. I restarted instance2, and hoped
 that instance1 will replicate from 2, but instead instanace2 replicated
 from
 instance1 . and ended up deleting the original index folder it had. There
 were around 2 million documents in that instance.

 Can any one of solrlcoud users give any hints if I can recover this data?




 --Shreejay



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Data-from-deleted-from-Solr-Solr-cloud-tp4028055.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other disaster more 
quickly than a re-index operation would yield.


I can't yet afford the extravagance of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various downsides:

1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure how, 
maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle unavailability 
of SolR, probably by displaying a down for maintenance message, but 
this should preferably be only a very short amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty
On 20 December 2012 15:46, Andy D'Arcy Jewell
andy.jew...@sysmicro.co.uk wrote:
 Hi all.

 Can anyone advise me of a way to pause and resume SolR 4 so I can perform a
 backup? I need to be able to revert to a usable (though not necessarily
 complete) index after a crash or other disaster more quickly than a
 re-index operation would yield.
[...]

Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora


RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma
You can use the replication handler to fetch a complete snapshot of the index 
over HTTP.
http://wiki.apache.org/solr/SolrReplication#HTTP_API
 
 
-Original message-
 From:Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk
 Sent: Thu 20-Dec-2012 11:23
 To: solr-user@lucene.apache.org
 Subject: Pause and resume indexing on SolR 4 for backups
 
 Hi all.
 
 Can anyone advise me of a way to pause and resume SolR 4 so I can 
 perform a backup? I need to be able to revert to a usable (though not 
 necessarily complete) index after a crash or other disaster more 
 quickly than a re-index operation would yield.
 
 I can't yet afford the extravagance of a separate SolR replica just 
 for backups, and I'm not sure if I'll ever have the luxury. I'm 
 currently running with just one node, be we are not yet live.
 
 I can think of the following ways to do this, each with various downsides:
 
 1) Just backup the existing index files whilst indexing continues
  + Easy
  + Fast
  - Incomplete
  - Potential for corruption? (e.g. partial files)
 
 2) Stop/Start Tomcat
  + Easy
  - Very slow and I/O, CPU intensive
  - Client gets errors when trying to connect
 
 3) Block/unblock SolR port with IpTables
  + Fast
  - Client gets errors when trying to connect
  - Have to wait for existing transactions to complete (not sure how, 
 maybe watch socket FD's in /proc)
 
 4) Pause/Restart SolR service
  + Fast ? (hopefully)
  - Client gets errors when trying to connect
 
 In any event, the web app will have to gracefully handle unavailability 
 of SolR, probably by displaying a down for maintenance message, but 
 this should preferably be only a very short amount of time.
 
 Can anyone comment on my proposed solutions above, or provide any 
 additional ones?
 
 Thanks for any input you can provide!
 
 -Andy
 
 -- 
 Andy D'Arcy Jewell
 
 SysMicro Limited
 Linux Support
 E:  andy.jew...@sysmicro.co.uk
 W:  www.sysmicro.co.uk
 
 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 10:24, Gora Mohanty wrote:


Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora
It's attached to a web-app, which accepts uploads and will be available 
24/7, with a global audience, so pausing it may be rather difficult 
(tho I may put this to the developer - it may for instance be possible 
if he has a small number of choke points for input into SolR).


Thanks.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty
On 20 December 2012 16:14, Andy D'Arcy Jewell
andy.jew...@sysmicro.co.uk wrote:
[...]
 It's attached to a web-app, which accepts uploads and will be available
 24/7, with a global audience, so pausing it may be rather difficult (tho I
 may put this to the developer - it may for instance be possible if he has a
 small number of choke points for input into SolR).
[...]

It adds work for the web developer, but one could pause indexing,
put indexing requests into some kind of a queuing system, do the
backup, and flush the queue when the backup is done.

Regards,
Gora


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
I've never used it, but the replication handler has an option:

  http://master_host:port/solr/replication?command=backup 

Which will take you a backup.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.

Upayavira

On Thu, Dec 20, 2012, at 10:34 AM, Markus Jelsma wrote:
 You can use the replication handler to fetch a complete snapshot of the
 index over HTTP.
 http://wiki.apache.org/solr/SolrReplication#HTTP_API
  
  
 -Original message-
  From:Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk
  Sent: Thu 20-Dec-2012 11:23
  To: solr-user@lucene.apache.org
  Subject: Pause and resume indexing on SolR 4 for backups
  
  Hi all.
  
  Can anyone advise me of a way to pause and resume SolR 4 so I can 
  perform a backup? I need to be able to revert to a usable (though not 
  necessarily complete) index after a crash or other disaster more 
  quickly than a re-index operation would yield.
  
  I can't yet afford the extravagance of a separate SolR replica just 
  for backups, and I'm not sure if I'll ever have the luxury. I'm 
  currently running with just one node, be we are not yet live.
  
  I can think of the following ways to do this, each with various downsides:
  
  1) Just backup the existing index files whilst indexing continues
   + Easy
   + Fast
   - Incomplete
   - Potential for corruption? (e.g. partial files)
  
  2) Stop/Start Tomcat
   + Easy
   - Very slow and I/O, CPU intensive
   - Client gets errors when trying to connect
  
  3) Block/unblock SolR port with IpTables
   + Fast
   - Client gets errors when trying to connect
   - Have to wait for existing transactions to complete (not sure how, 
  maybe watch socket FD's in /proc)
  
  4) Pause/Restart SolR service
   + Fast ? (hopefully)
   - Client gets errors when trying to connect
  
  In any event, the web app will have to gracefully handle unavailability 
  of SolR, probably by displaying a down for maintenance message, but 
  this should preferably be only a very short amount of time.
  
  Can anyone comment on my proposed solutions above, or provide any 
  additional ones?
  
  Thanks for any input you can provide!
  
  -Andy
  
  -- 
  Andy D'Arcy Jewell
  
  SysMicro Limited
  Linux Support
  E:  andy.jew...@sysmicro.co.uk
  W:  www.sysmicro.co.uk
  
  


Re: Finding the last committed record in SOLR 4

2012-12-20 Thread Upayavira
I cannot see how SolrJ and the admin UI would return different results.
Could you run exactly the same query on both and show what you get here?

Upayavira

On Thu, Dec 20, 2012, at 06:17 AM, Joe wrote:
 I'm using SOLR 4 for an application, where I need to search the index
 soon
 after inserting records. 
 
 I'm using the solrj code below to get the last ID in the index. However,
 I
 noticed that the last id I see when I execute a query through the solr
 web
 admin is often lagging behind this. And that my searches are not
 including
 all documents up until the last ID I get from the code snippet below. I'm
 guessing this is because of delays in hard commits. I don't need to
 switch
 to soft commits yet. I just want to make sure that I get the ID of the
 last
 searchable document. Is this possible to do?
 
 
SolrQuery query = new SolrQuery();
query.set(qt,/select);
query.setQuery( *:* );
query.setFields(id);
query.set(rows,1);
query.set(sort,id desc);
 
QueryResponse rsp = m_Server.query( query );
SolrDocumentList docs = rsp.getResults();
SolrDocument doc = docs.get(0);
long id = (Long) doc.getFieldValue(id);

 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Finding-the-last-committed-record-in-SOLR-4-tp4028235.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic modification of field value

2012-12-20 Thread Upayavira
Which strikes me as the right way to go.

Upayavira

On Thu, Dec 20, 2012, at 12:30 PM, AlexeyK wrote:
 Implemented it with http://wiki.apache.org/solr/DocTransformers.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Dynamic-modification-of-field-value-tp4028234p4028301.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 11:58, Upayavira wrote:

I've never used it, but the replication handler has an option:

   http://master_host:port/solr/replication?command=backup

Which will take you a backup.
I've looked at that this morning as suggested by Markus Jelsma. Looks 
good, but I'll have to work out how to use the resultant backup 
directory. I've been dealing with another unrelated issue in the 
mean-time and I haven't had a chance to look for any docu so far.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.
That's the approach that Shawn Heisey proposed, and what I've been 
working towards,  but it still leaves open the question of how to 
*pause* SolR or prevent commits during the backup (otherwise we have a 
potential race condition).


-Andy


--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.

Upayavira

On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
 On 20/12/12 11:58, Upayavira wrote:
  I've never used it, but the replication handler has an option:
 
 http://master_host:port/solr/replication?command=backup
 
  Which will take you a backup.
 I've looked at that this morning as suggested by Markus Jelsma. Looks 
 good, but I'll have to work out how to use the resultant backup 
 directory. I've been dealing with another unrelated issue in the 
 mean-time and I haven't had a chance to look for any docu so far.
  Also something to note, if you don't want to use the above, and you are
  running on Unix, you can create fast 'hard link' clones of lucene
  indexes. Doing:
 
  cp -lr data data.bak
 
  will copy your index instantly. If you can avoid doing this when a
  commit is happening, then you'll have a good index copy, that will take
  no space on your disk and be made instantly. This is because it just
  copies the directory structure, not the files themselves, and given
  files in a lucene index never change (they are only ever deleted or
  replaced), this works as a good copy technique for backing up.
 That's the approach that Shawn Heisey proposed, and what I've been 
 working towards,  but it still leaves open the question of how to 
 *pause* SolR or prevent commits during the backup (otherwise we have a 
 potential race condition).
 
 -Andy
 
 
 -- 
 Andy D'Arcy Jewell
 
 SysMicro Limited
 Linux Support
 E:  andy.jew...@sysmicro.co.uk
 W:  www.sysmicro.co.uk
 


Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Upayavira
Personally I have never given it any attention, so I suspect it doesn't
matter much.

Upayavira

On Thu, Dec 20, 2012, at 05:08 AM, Alexandre Rafalovitch wrote:
 Hello,
 
 In the schema.xml, we have a name attribute on the root note. The
 documentation says it is for display purpose only. But for display where?
 
 It seems that the admin console uses the name in the solr.xml file
 instead.
 And deleting the name attribute does not seem to cause any problems
 either.
 
 The reason I ask is because I am writing an explanation example which
 involves schema.xml config file being copied and modified over and over
 again. If @name is significant, I need to mention changing it. If not, I
 will just delete it all together.
 
 Regards,
Alex.
 
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma
You can use the postCommit event in updateHandler to execute a task. 
 
-Original message-
 From:Upayavira u...@odoko.co.uk
 Sent: Thu 20-Dec-2012 14:45
 To: solr-user@lucene.apache.org
 Subject: Re: Pause and resume indexing on SolR 4 for backups
 
 The backup directory should just be a clone of the index files. I'm
 curious to know whether it is a cp -r or a cp -lr that the replication
 handler produces.
 
 You would prevent commits by telling your app not to commit. That is,
 Solr only commits when it is *told* to.
 
 Unless you use autocommit, in which case I guess you could monitor your
 logs for the last commit, and do your backup a 10 seconds after that.
 
 Upayavira
 
 On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
  On 20/12/12 11:58, Upayavira wrote:
   I've never used it, but the replication handler has an option:
  
  http://master_host:port/solr/replication?command=backup
  
   Which will take you a backup.
  I've looked at that this morning as suggested by Markus Jelsma. Looks 
  good, but I'll have to work out how to use the resultant backup 
  directory. I've been dealing with another unrelated issue in the 
  mean-time and I haven't had a chance to look for any docu so far.
   Also something to note, if you don't want to use the above, and you are
   running on Unix, you can create fast 'hard link' clones of lucene
   indexes. Doing:
  
   cp -lr data data.bak
  
   will copy your index instantly. If you can avoid doing this when a
   commit is happening, then you'll have a good index copy, that will take
   no space on your disk and be made instantly. This is because it just
   copies the directory structure, not the files themselves, and given
   files in a lucene index never change (they are only ever deleted or
   replaced), this works as a good copy technique for backing up.
  That's the approach that Shawn Heisey proposed, and what I've been 
  working towards,  but it still leaves open the question of how to 
  *pause* SolR or prevent commits during the backup (otherwise we have a 
  potential race condition).
  
  -Andy
  
  
  -- 
  Andy D'Arcy Jewell
  
  SysMicro Limited
  Linux Support
  E:  andy.jew...@sysmicro.co.uk
  W:  www.sysmicro.co.uk
  
 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
That's neat, but wouldn't that run on every commit? How would you use it
to, say, back up once a day?

Upayavira

On Thu, Dec 20, 2012, at 01:57 PM, Markus Jelsma wrote:
 You can use the postCommit event in updateHandler to execute a task. 
  
 -Original message-
  From:Upayavira u...@odoko.co.uk
  Sent: Thu 20-Dec-2012 14:45
  To: solr-user@lucene.apache.org
  Subject: Re: Pause and resume indexing on SolR 4 for backups
  
  The backup directory should just be a clone of the index files. I'm
  curious to know whether it is a cp -r or a cp -lr that the replication
  handler produces.
  
  You would prevent commits by telling your app not to commit. That is,
  Solr only commits when it is *told* to.
  
  Unless you use autocommit, in which case I guess you could monitor your
  logs for the last commit, and do your backup a 10 seconds after that.
  
  Upayavira
  
  On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
   On 20/12/12 11:58, Upayavira wrote:
I've never used it, but the replication handler has an option:
   
   http://master_host:port/solr/replication?command=backup
   
Which will take you a backup.
   I've looked at that this morning as suggested by Markus Jelsma. Looks 
   good, but I'll have to work out how to use the resultant backup 
   directory. I've been dealing with another unrelated issue in the 
   mean-time and I haven't had a chance to look for any docu so far.
Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:
   
cp -lr data data.bak
   
will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.
   That's the approach that Shawn Heisey proposed, and what I've been 
   working towards,  but it still leaves open the question of how to 
   *pause* SolR or prevent commits during the backup (otherwise we have a 
   potential race condition).
   
   -Andy
   
   
   -- 
   Andy D'Arcy Jewell
   
   SysMicro Limited
   Linux Support
   E:  andy.jew...@sysmicro.co.uk
   W:  www.sysmicro.co.uk
   
  


Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Modou DIA
Hi everybody,

i'm newbie with Solr technologies but in the past i worked with lucene
and another solution similar to Solr.
I'm working with solr 4.0. I use solrj for embedding an Solr server in
a cocoon 2.1 application.

I want to know if it's possible to store (without indexing) a field
containing a xml sequence. I mean a field which can store xml data in
indexes without losing xpath informations.

For exemple, this's a document to index:

add
  doc
field name=idid_1/field
field name=infotesting/field
field name=subdoc
  subdoc id=id_1
datatesting/data
  /subdoc
/field
  /doc
...
/add

As you can see, the field named subdoc contains an xml sequence.

So, when i query the indexes, i want to retrieve the data in subdoc
and i want to conserve the xml markup.

Thank you for your help.
-- 
--
| Modou DIA
| modo...@gmail.com
--


Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Jack Krupansky
I checked the 4.x source code and except for the fact that you will get a 
warning if you leave it out, nothing uses that name. But... that's not to 
say that a future release might not require it - the doc/comments don't 
explicitly say that it is optional.


Note that the version attribute is optional (as per the source code, but no 
mention in doc/comments) and defaults to 1.0, with no warning.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lucene.apache.org
Subject: Where does schema.xml's schema/@name displays?

Hello,

In the schema.xml, we have a name attribute on the root note. The
documentation says it is for display purpose only. But for display where?

It seems that the admin console uses the name in the solr.xml file instead.
And deleting the name attribute does not seem to cause any problems either.

The reason I ask is because I am writing an explanation example which
involves schema.xml config file being copied and modified over and over
again. If @name is significant, I need to mention changing it. If not, I
will just delete it all together.

Regards,
  Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 



Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query

2012-12-20 Thread Nalini Kartha
Hi James,

I don't get how the spellcheck.maxResultsForSuggest param helps with making
sure that the suggestions returned satisfy the fq params?

That's the main problem we're trying to solve, how often suggestions are
being returned is not really an issue for us at the moment.

Thanks,
Nalini


On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Instead of using spellcheck.collateParam.mm, try just setting
 spellcheck.maxResultsForSuggest to a very high value (you can use up to
 Integer.MAX_VALUE here).  So long as the user gets fewer results that
 whatever this is set for, you will get suggestions (and collations if
 desired).  I was just playing with this and if I am understanding you
 correctly think this combination of parameters will give you what you want:

 spellcheck=true

 spellcheck.dictionary=whatever

 spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is
 before you don't want suggestions)

 spellcheck.count=20 (+/- depending on performance vs # suggestions
 required)

 spellcheck.maxCollationTries=10 (+/- depending on performance vs #
 suggestions required)

 spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions
 required)

 spellcheck.collate=true

 spellcheck.collateExtendedResults=true

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Nalini Kartha [mailto:nalinikar...@gmail.com]
 Sent: Wednesday, December 19, 2012 2:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq
 params for default OR query

 Hi James,

 Yup the example you gave about sums it up. Reason we use an OR query is
 that we want the flexibility of every term not having to match but when it
 comes to corrections we want to be sure that the ones we pick will actually
 return results (we message the user with the corrected query so it would be
 bad/confusing if there were no matches for the corrections).

 *- by default the spellchecker doesn't see this as a problem because it has
 hits (mm=0 and wrapping matches something).  So you get neither
 individual words back nor collations from the spellchecker.*
 *
 *
 I think we would still get back 'papr - paper' as a correction and
 'christmas wrapping paper' as a collation in this case - I've seen that
 corrections are returned even for OR queries. Problem is these would be
 returned even if 'paper' doesn't exist in any docs that have item:in_stock.

 *- with spellcheck.collateParam.mm http://spellcheck.collateparam.mm/
 =100
 it tries to fix both papr and christmas but can't fix christmas
 because spelling isn't the problem here (it is an irrelevant term not in
 the index).  So while you get words suggested there are no collations.  The
 individual words would be helpful, but you're not sure because they might
 all apply to items that do not match fq=item:in_stock.*

 Yup, exactly.

 Do you think the workaround I suggested would work (and not have terrible
 perf)? Or any other ideas?

 Thanks,
 Nalini


 On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  Let me try and get a better idea of what you're after.  Is it that your
  users might query a combination of irrelevant terms and misspelled terms,
  so you want the ability to ignore the irrelevant terms but still get
  suggestions for the misspelled terms?
 
  For instance if someone wanted q=christmas wrapping
  paprmm=0fq=item:in_stock, but christmas was not in the index and you
  wanted to return results for just wrapping paper, the problem is...
 
  - by default the spellchecker doesn't see this as a problem because it
 has
  hits (mm=0 and wrapping matches something).  So you get neither
  individual words back nor collations from the spellchecker.
 
  - with spellcheck.collateParam.mm=100 it tries to fix both papr and
  christmas but can't fix christmas because spelling isn't the problem
  here (it is an irrelevant term not in the index).  So while you get words
  suggested there are no collations.  The individual words would be
 helpful,
  but you're not sure because they might all apply to items that do not
 match
  fq=item:in_stock.
 
  Is this the problem?
 
  James Dyer
  E-Commerce Systems
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Nalini Kartha [mailto:nalinikar...@gmail.com]
  Sent: Wednesday, December 19, 2012 11:20 AM
  To: solr-user@lucene.apache.org
  Subject: Ensuring SpellChecker returns corrections which satisfy fq
 params
  for default OR query
 
  Hi,
 
  With the DirectSolrSpellChecker, we want to be able to make sure that the
  corrections that are being returned satisfy the fq params of the original
  query.
 
  The collate functionality helps with this but seems to only work with
  default AND queries - our use case is for default OR queries. I also saw
  that there is now a spellcheck.collateParam.XX param which allows you to
  override params 

Re: SolrCloud: only partial results returned

2012-12-20 Thread Mark Miller
Does all the data have unique ids?

- Mark

On Dec 19, 2012, at 8:30 PM, Lili lyuan1...@gmail.com wrote:

 We set up SolrCloud with 2 shards and separate multiple zookeepers.   The
 data added using http post with json in tutorial sample are not completely
 returned in query.However, if you send the same http post request again
 or shutdown solr instance and restart,  the complete results will be
 returned.   
 
 We have tried adding distrib=true in query or even adding shards=..  
 Still,  only partial results are returned. 
 
 This happened with embeded zookeepers too.
 
 However,  this doesn't seem to happen if you add data with xml in tutorial
 samples.
 
 Any thoughts on what might be wrong or is it a known issue?
 
 
 Thanks,
 
 Lili
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query

2012-12-20 Thread Dyer, James
The spellchecker doesn't support checking the indivdual words against the index 
with fq applied.  This is only done for collations (and only if 
maxCollationTries is greater than 0).  Checking every suggested word 
individually against the index after applying filter queries is probably going 
to be very expensive no matter how you implement it.  However, someone with 
more lucene-core knowledge than I have might be able to give you better advice.

If your root problem, though, is getting good did-you-mean-style suggestions 
with dismax queries and mm=0, and if you want to consider the case where some 
words in the query are misspelled and others are entirely irrelevant (and can't 
be corrected), then setting maxResultsForSuggest to a high value might give 
you the end result you want.  Unlike if you use 
spellcheck.collateParam.mm=100%, it won't insist that the irrelevant terms 
(or a corrected irrelevant term) match anything.  On the other hand, it won't 
assume the query is Correctly
Spelled just because you got some hits from it (because mm=0 will just cause 
the misspelled terms to be thrown out).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 20, 2012 8:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq params 
for default OR query

Hi James,

I don't get how the spellcheck.maxResultsForSuggest param helps with making
sure that the suggestions returned satisfy the fq params?

That's the main problem we're trying to solve, how often suggestions are
being returned is not really an issue for us at the moment.

Thanks,
Nalini


On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Instead of using spellcheck.collateParam.mm, try just setting
 spellcheck.maxResultsForSuggest to a very high value (you can use up to
 Integer.MAX_VALUE here).  So long as the user gets fewer results that
 whatever this is set for, you will get suggestions (and collations if
 desired).  I was just playing with this and if I am understanding you
 correctly think this combination of parameters will give you what you want:

 spellcheck=true

 spellcheck.dictionary=whatever

 spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is
 before you don't want suggestions)

 spellcheck.count=20 (+/- depending on performance vs # suggestions
 required)

 spellcheck.maxCollationTries=10 (+/- depending on performance vs #
 suggestions required)

 spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions
 required)

 spellcheck.collate=true

 spellcheck.collateExtendedResults=true

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Nalini Kartha [mailto:nalinikar...@gmail.com]
 Sent: Wednesday, December 19, 2012 2:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq
 params for default OR query

 Hi James,

 Yup the example you gave about sums it up. Reason we use an OR query is
 that we want the flexibility of every term not having to match but when it
 comes to corrections we want to be sure that the ones we pick will actually
 return results (we message the user with the corrected query so it would be
 bad/confusing if there were no matches for the corrections).

 *- by default the spellchecker doesn't see this as a problem because it has
 hits (mm=0 and wrapping matches something).  So you get neither
 individual words back nor collations from the spellchecker.*
 *
 *
 I think we would still get back 'papr - paper' as a correction and
 'christmas wrapping paper' as a collation in this case - I've seen that
 corrections are returned even for OR queries. Problem is these would be
 returned even if 'paper' doesn't exist in any docs that have item:in_stock.

 *- with spellcheck.collateParam.mm http://spellcheck.collateparam.mm/
 =100
 it tries to fix both papr and christmas but can't fix christmas
 because spelling isn't the problem here (it is an irrelevant term not in
 the index).  So while you get words suggested there are no collations.  The
 individual words would be helpful, but you're not sure because they might
 all apply to items that do not match fq=item:in_stock.*

 Yup, exactly.

 Do you think the workaround I suggested would work (and not have terrible
 perf)? Or any other ideas?

 Thanks,
 Nalini


 On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  Let me try and get a better idea of what you're after.  Is it that your
  users might query a combination of irrelevant terms and misspelled terms,
  so you want the ability to ignore the irrelevant terms but still get
  suggestions for the misspelled terms?
 
  For instance if someone wanted q=christmas wrapping
  paprmm=0fq=item:in_stock, but christmas was not in the index and you
  wanted to 

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 13:38, Upayavira wrote:

The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.


Hmm. Strange - the files created by the backup API don't seem to 
correlate exactly with the files stored under the solr data directory:


andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/_2vq.fdx
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
/tmp/snapshot.20121220155853703/segments_2vs
/tmp/snapshot.20121220155853703/_2vq_nrm.cfs
/tmp/snapshot.20121220155853703/_2vq.fnm
/tmp/snapshot.20121220155853703/_2vq_nrm.cfe
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
/tmp/snapshot.20121220155853703/_2vq.fdt
/tmp/snapshot.20121220155853703/_2vq.si
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
andydj@me-solr01:~$ find /var/lib/solr/data/index/
/var/lib/solr/data/index/
/var/lib/solr/data/index/_2w6_Lucene40_0.frq
/var/lib/solr/data/index/_2w6.si
/var/lib/solr/data/index/segments_2w8
/var/lib/solr/data/index/write.lock
/var/lib/solr/data/index/_2w6_nrm.cfs
/var/lib/solr/data/index/_2w6.fdx
/var/lib/solr/data/index/_2w6_Lucene40_0.tip
/var/lib/solr/data/index/_2w6_nrm.cfe
/var/lib/solr/data/index/segments.gen
/var/lib/solr/data/index/_2w6.fnm
/var/lib/solr/data/index/_2w6.fdt
/var/lib/solr/data/index/_2w6_Lucene40_0.tim

Am I correct in thinking that to restore from this backup, I would need 
to do the following?


1. Stop Tomcat (or maybe just solr)
2. Remove all files under /var/lib/solr/data/index/
3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
/var/lib/solr/data/index/

4. Restart Tomcat (or just solr)


Thanks everyone who's pitched in on this! Once I've got this working, 
I'll document it.

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
Are you sure a commit didn't happen between? Also, a background merge
might have happened.

As to using a backup, you are right, just stop solr, put the snapshot
into index/data, and restart.

Upayavira

On Thu, Dec 20, 2012, at 05:16 PM, Andy D'Arcy Jewell wrote:
 On 20/12/12 13:38, Upayavira wrote:
  The backup directory should just be a clone of the index files. I'm
  curious to know whether it is a cp -r or a cp -lr that the replication
  handler produces.
 
  You would prevent commits by telling your app not to commit. That is,
  Solr only commits when it is *told* to.
 
  Unless you use autocommit, in which case I guess you could monitor your
  logs for the last commit, and do your backup a 10 seconds after that.
 
 
 Hmm. Strange - the files created by the backup API don't seem to 
 correlate exactly with the files stored under the solr data directory:
 
 andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
 /tmp/snapshot.20121220155853703/
 /tmp/snapshot.20121220155853703/_2vq.fdx
 /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
 /tmp/snapshot.20121220155853703/segments_2vs
 /tmp/snapshot.20121220155853703/_2vq_nrm.cfs
 /tmp/snapshot.20121220155853703/_2vq.fnm
 /tmp/snapshot.20121220155853703/_2vq_nrm.cfe
 /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
 /tmp/snapshot.20121220155853703/_2vq.fdt
 /tmp/snapshot.20121220155853703/_2vq.si
 /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
 andydj@me-solr01:~$ find /var/lib/solr/data/index/
 /var/lib/solr/data/index/
 /var/lib/solr/data/index/_2w6_Lucene40_0.frq
 /var/lib/solr/data/index/_2w6.si
 /var/lib/solr/data/index/segments_2w8
 /var/lib/solr/data/index/write.lock
 /var/lib/solr/data/index/_2w6_nrm.cfs
 /var/lib/solr/data/index/_2w6.fdx
 /var/lib/solr/data/index/_2w6_Lucene40_0.tip
 /var/lib/solr/data/index/_2w6_nrm.cfe
 /var/lib/solr/data/index/segments.gen
 /var/lib/solr/data/index/_2w6.fnm
 /var/lib/solr/data/index/_2w6.fdt
 /var/lib/solr/data/index/_2w6_Lucene40_0.tim
 
 Am I correct in thinking that to restore from this backup, I would need 
 to do the following?
 
 1. Stop Tomcat (or maybe just solr)
 2. Remove all files under /var/lib/solr/data/index/
 3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
 /var/lib/solr/data/index/
 4. Restart Tomcat (or just solr)
 
 
 Thanks everyone who's pitched in on this! Once I've got this working, 
 I'll document it.
 -Andy
 
 -- 
 Andy D'Arcy Jewell
 
 SysMicro Limited
 Linux Support
 E:  andy.jew...@sysmicro.co.uk
 W:  www.sysmicro.co.uk
 


Re: SolrCloud: only partial results returned

2012-12-20 Thread Lili
Mark,  yes,  they have unique ids.   Most the time, after the 2nd json http
post, query will return complete results. 

I believe the data was indexed already with 1st post since if I shutdown the
solr after 1st post and restart again,  query will return complete result
set.

Thanks,

Lili



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200p4028367.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr/Lucene Engineer - Contract Opportunity - Raleigh, NC

2012-12-20 Thread Polak, Tom
Hi Lance,

I am an IT Recruiter in Raleigh, NC. Would you or would anyone you know be 
interested in a long term contract opportunity for a Solr/Lucene Engineer with 
Cisco here in RTP, NC?

Thanks for your time Lance and have a safe and happy Holiday!


[Description: Description: Description: Description: 
C:\Users\dhil2\AppData\Roaming\Microsoft\Signatures\ExperisIT.jpg]

Tom Polak
IT Recruiter


Experis IT Staffing
1122 Oberlin Road
Raleigh, NC 27605
T:

919 755 5838

F:

919 755 5828

C:

919 457 8530

tom.po...@experis.commailto:tom.po...@experis.com
www.experis.comhttp://www.experis.com/



[Description: 
cid:image008.png@01CD9A38.C965CAF0]https://twitter.com/tphires4itinnc

[Description: 
cid:image007.gif@01CD9A38.C965CAF0]http://www.linkedin.com/in/tompolak

Referral Program:  Easy ...Refer an IT Professional in your network to me 
today!



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Lance Norskog
To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
that the index is never in a bogus state. All data files are written and 
flushed to disk, then the segments.* files are written that match the 
data files. You can capture the files with a set of hard links to create 
a backup.


The CheckIndex program will verify the index backup.
java -cp yourcopy/lucene-core-SOMETHING.jar 
org.apache.lucene.index.CheckIndex collection/data/index


lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
Solr is unpacked.


On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other disaster more 
quickly than a re-index operation would yield.


I can't yet afford the extravagance of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various 
downsides:


1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure 
how, maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle 
unavailability of SolR, probably by displaying a down for 
maintenance message, but this should preferably be only a very short 
amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy





Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
 To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
 that the index is never in a bogus state. All data files are written and 
 flushed to disk, then the segments.* files are written that match the 
 data files. You can capture the files with a set of hard links to create 
 a backup.
 
 The CheckIndex program will verify the index backup.
 java -cp yourcopy/lucene-core-SOMETHING.jar 
 org.apache.lucene.index.CheckIndex collection/data/index
 
 lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
 Solr is unpacked.
 
 On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
  Hi all.
 
  Can anyone advise me of a way to pause and resume SolR 4 so I can 
  perform a backup? I need to be able to revert to a usable (though not 
  necessarily complete) index after a crash or other disaster more 
  quickly than a re-index operation would yield.
 
  I can't yet afford the extravagance of a separate SolR replica just 
  for backups, and I'm not sure if I'll ever have the luxury. I'm 
  currently running with just one node, be we are not yet live.
 
  I can think of the following ways to do this, each with various 
  downsides:
 
  1) Just backup the existing index files whilst indexing continues
  + Easy
  + Fast
  - Incomplete
  - Potential for corruption? (e.g. partial files)
 
  2) Stop/Start Tomcat
  + Easy
  - Very slow and I/O, CPU intensive
  - Client gets errors when trying to connect
 
  3) Block/unblock SolR port with IpTables
  + Fast
  - Client gets errors when trying to connect
  - Have to wait for existing transactions to complete (not sure 
  how, maybe watch socket FD's in /proc)
 
  4) Pause/Restart SolR service
  + Fast ? (hopefully)
  - Client gets errors when trying to connect
 
  In any event, the web app will have to gracefully handle 
  unavailability of SolR, probably by displaying a down for 
  maintenance message, but this should preferably be only a very short 
  amount of time.
 
  Can anyone comment on my proposed solutions above, or provide any 
  additional ones?
 
  Thanks for any input you can provide!
 
  -Andy
 
 


Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Mikhail Khludnev
Jack,
FWIW I've found occurrence in SystemInfoHandler.java


On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky j...@basetechnology.comwrote:

 I checked the 4.x source code and except for the fact that you will get a
 warning if you leave it out, nothing uses that name. But... that's not to
 say that a future release might not require it - the doc/comments don't
 explicitly say that it is optional.

 Note that the version attribute is optional (as per the source code, but
 no mention in doc/comments) and defaults to 1.0, with no warning.

 -- Jack Krupansky


 -Original Message- From: Alexandre Rafalovitch
 Sent: Thursday, December 20, 2012 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Where does schema.xml's schema/@name displays?

 Hello,

 In the schema.xml, we have a name attribute on the root note. The
 documentation says it is for display purpose only. But for display where?

 It seems that the admin console uses the name in the solr.xml file instead.
 And deleting the name attribute does not seem to cause any problems either.

 The reason I ask is because I am writing an explanation example which
 involves schema.xml config file being copied and modified over and over
 again. If @name is significant, I need to mention changing it. If not, I
 will just delete it all together.

 Regards,
   Alex.


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread alxsss
Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?

Thanks.
Alex.

 

 

 

-Original Message-
From: Upayavira u...@odoko.co.uk
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Dec 20, 2012 11:51 am
Subject: Re: Pause and resume indexing on SolR 4 for backups


You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
 To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
 that the index is never in a bogus state. All data files are written and 
 flushed to disk, then the segments.* files are written that match the 
 data files. You can capture the files with a set of hard links to create 
 a backup.
 
 The CheckIndex program will verify the index backup.
 java -cp yourcopy/lucene-core-SOMETHING.jar 
 org.apache.lucene.index.CheckIndex collection/data/index
 
 lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
 Solr is unpacked.
 
 On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
  Hi all.
 
  Can anyone advise me of a way to pause and resume SolR 4 so I can 
  perform a backup? I need to be able to revert to a usable (though not 
  necessarily complete) index after a crash or other disaster more 
  quickly than a re-index operation would yield.
 
  I can't yet afford the extravagance of a separate SolR replica just 
  for backups, and I'm not sure if I'll ever have the luxury. I'm 
  currently running with just one node, be we are not yet live.
 
  I can think of the following ways to do this, each with various 
  downsides:
 
  1) Just backup the existing index files whilst indexing continues
  + Easy
  + Fast
  - Incomplete
  - Potential for corruption? (e.g. partial files)
 
  2) Stop/Start Tomcat
  + Easy
  - Very slow and I/O, CPU intensive
  - Client gets errors when trying to connect
 
  3) Block/unblock SolR port with IpTables
  + Fast
  - Client gets errors when trying to connect
  - Have to wait for existing transactions to complete (not sure 
  how, maybe watch socket FD's in /proc)
 
  4) Pause/Restart SolR service
  + Fast ? (hopefully)
  - Client gets errors when trying to connect
 
  In any event, the web app will have to gracefully handle 
  unavailability of SolR, probably by displaying a down for 
  maintenance message, but this should preferably be only a very short 
  amount of time.
 
  Can anyone comment on my proposed solutions above, or provide any 
  additional ones?
 
  Thanks for any input you can provide!
 
  -Andy
 
 

 


Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Jack Krupansky

Yeah... not sure how I missed it, but my search sees it now.

Also, the name will default to schema.xml is you do leave it out of the 
schema.


-- Jack Krupansky

-Original Message- 
From: Mikhail Khludnev

Sent: Thursday, December 20, 2012 3:06 PM
To: solr-user
Subject: Re: Where does schema.xml's schema/@name displays?

Jack,
FWIW I've found occurrence in SystemInfoHandler.java


On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky 
j...@basetechnology.comwrote:



I checked the 4.x source code and except for the fact that you will get a
warning if you leave it out, nothing uses that name. But... that's not to
say that a future release might not require it - the doc/comments don't
explicitly say that it is optional.

Note that the version attribute is optional (as per the source code, but
no mention in doc/comments) and defaults to 1.0, with no warning.

-- Jack Krupansky


-Original Message- From: Alexandre Rafalovitch
Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lucene.apache.org
Subject: Where does schema.xml's schema/@name displays?

Hello,

In the schema.xml, we have a name attribute on the root note. The
documentation says it is for display purpose only. But for display where?

It seems that the admin console uses the name in the solr.xml file 
instead.
And deleting the name attribute does not seem to cause any problems 
either.


The reason I ask is because I am writing an explanation example which
involves schema.xml config file being copied and modified over and over
again. If @name is significant, I need to mention changing it. If not, I
will just delete it all together.

Regards,
  Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: 
http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch

- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com 



Re: Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Alexandre Rafalovitch
What happens if you just supply it as CDATA into a string field? Store, no
index, probably compressed and lazy.

Regards,
Alex
On 20 Dec 2012 09:30, Modou DIA modo...@gmail.com wrote:

 Hi everybody,

 i'm newbie with Solr technologies but in the past i worked with lucene
 and another solution similar to Solr.
 I'm working with solr 4.0. I use solrj for embedding an Solr server in
 a cocoon 2.1 application.

 I want to know if it's possible to store (without indexing) a field
 containing a xml sequence. I mean a field which can store xml data in
 indexes without losing xpath informations.

 For exemple, this's a document to index:

 add
   doc
 field name=idid_1/field
 field name=infotesting/field
 field name=subdoc
   subdoc id=id_1
 datatesting/data
   /subdoc
 /field
   /doc
 ...
 /add

 As you can see, the field named subdoc contains an xml sequence.

 So, when i query the indexes, i want to retrieve the data in subdoc
 and i want to conserve the xml markup.

 Thank you for your help.
 --
 --
 | Modou DIA
 | modo...@gmail.com
 --



Re: Store and retrieve an xml sequence without losing the markup

2012-12-20 Thread Upayavira
Right, you can store it, but you can't search on it that way, and you
certainly can't do complex searches that take the XML structure into
account (e.g. xpath queries).

Upayavira

On Thu, Dec 20, 2012, at 10:22 PM, Alexandre Rafalovitch wrote:
 What happens if you just supply it as CDATA into a string field? Store,
 no
 index, probably compressed and lazy.
 
 Regards,
 Alex
 On 20 Dec 2012 09:30, Modou DIA modo...@gmail.com wrote:
 
  Hi everybody,
 
  i'm newbie with Solr technologies but in the past i worked with lucene
  and another solution similar to Solr.
  I'm working with solr 4.0. I use solrj for embedding an Solr server in
  a cocoon 2.1 application.
 
  I want to know if it's possible to store (without indexing) a field
  containing a xml sequence. I mean a field which can store xml data in
  indexes without losing xpath informations.
 
  For exemple, this's a document to index:
 
  add
doc
  field name=idid_1/field
  field name=infotesting/field
  field name=subdoc
subdoc id=id_1
  datatesting/data
/subdoc
  /field
/doc
  ...
  /add
 
  As you can see, the field named subdoc contains an xml sequence.
 
  So, when i query the indexes, i want to retrieve the data in subdoc
  and i want to conserve the xml markup.
 
  Thank you for your help.
  --
  --
  | Modou DIA
  | modo...@gmail.com
  --
 


RE: occasional GC crashes

2012-12-20 Thread Petersen, Robert
Hi Otis,

I thought Java 7 had a bug which wasn't being addressed by Oracle which was 
making it not suitable for Solr.  Did that get fixed now?
http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/

I did see this but it doesn't really mention the bug:  
http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/

Thanks
Robi


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Tuesday, December 18, 2012 5:25 PM
To: solr-user@lucene.apache.org
Subject: Re: occasional GC crashes

Robert,

Step 1 is to get the latest Java 7 or if you have to remain on 6 then use the 
latest 6.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012 7:54 PM, 
Petersen, Robert rober...@buy.com wrote:

  Hi solr user group,

 ** **

 Sorry if this isn't directly a Solr question.  Seems like once in a 
 blue moon the GC crashes on a server in our Solr 3.6.1 slave farm.  
 This seems to only happen on a couple of the twelve slaves we have 
 deployed and only very rarely on those.  It seems like this doesn't 
 directly affect solr because in the logs it looks like solr keeps 
 working after the time of the exception but our external monitoring 
 tool reports that the solr service is down so our operations department 
 restarts solr on that box and alerts me.
 The solr logs show nothing unusual.  The exception does show up in the 
 catalina.out log file though.  Does this happen to anyone else?  Here is
 the basic error and I have attached the crash dump file also.   Our total
 uptime on these boxes is over a year now BTW.

 ** **

 #

 # A fatal error has been detected by the Java Runtime Environment:

 #

 #  SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724, 
 tid=1082353984

 #

 # JRE version: 6.0_25-b06

 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
 linux-amd64 )

 # Problematic frame:

 # V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
 long)+0x82

 #

 # An error report file with more information is saved as:

 # /var/LucidWorks/lucidworks/hs_err_pid13724.log

 #

 # If you would like to submit a bug report, please visit:

 #   http://java.sun.com/webapps/bugreport/crash.jsp

 #

 ** **

 VM Arguments:

 jvm_args:
 -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf
 /logging.properties 
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
 -Dcom.sun.management.jmxremote 
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dcom.sun.management.jmxremote.port=6060
 -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed
 -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat
 -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat
 -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp 

 java_command: org.apache.catalina.startup.Bootstrap -server 
 -Dsolr.solr.home=lucidworks/solr start

 Launcher Type: SUN_STANDARD

 ** **

 Stack: [0x,0x],  
 sp=0x40835eb0, free space=1056983k

 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
 C=native
 code)

 V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
 long)+0x82

 V  [libjvm.so+0x3c481a]  
 CMSConcMarkingTask::do_work_steal(int)+0xfa

 V  [libjvm.so+0x3c3dcf]  CMSConcMarkingTask::work(int)+0xef

 V  [libjvm.so+0x8783dc]  YieldingFlexibleGangWorker::loop()+0xbc

 V  [libjvm.so+0x8755b4]  GangWorker::run()+0x24

 V  [libjvm.so+0x71096f]  java_start(Thread*)+0x13f

 ** **

 Heap

 par new generation   total 345024K, used 180672K [0x2e12,
 0x2aaac578, 0x2aaac578)

   eden space 306688K,  53% used [0x2e12, 
 0x2aaab8243c28,
 0x2aaac0ca)

   from space 38336K,  40% used [0x2aaac321, 
 0x2aaac415c3f8,
 0x2aaac578)

   to   space 38336K,   0% used [0x2aaac0ca, 0x2aaac0ca,
 0x2aaac321)

 concurrent mark-sweep generation total 33171072K, used 12144213K 
 [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12)

 concurrent-mark-sweep perm gen total 83968K, used 50650K 
 [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332)

 ** **

 Code Cache  [0x2b054000, 0x2b9a4000, 
 0x2e054000)**
 **

 total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512
 largest_free_block=15808

 ** **

 ** **

 ** **

 Thanks,

 ** **

 *Robert (Robi) Petersen*

 Senior Software Engineer

 Search Department

 ** **




Japanese exact match results do not show on top of results

2012-12-20 Thread kirpakaro
Hi folks,

I am having couple of problems with Japanese data, 1. it is not properly
indexing all the data 2. displaying the exact match result on top and then
90%match and 80%match etc. does not work.
 I am using solr3.6.1 and using text_ja as the fieldType here is the schema 


   field name=q type=text_ja indexed=true stored=true /
   field name=qs type=text_general indexed=false stored=true
multiValued=true/
   field name=q_e type=string indexed=true stored=true /

 copyField source=q dest=q_e maxChars=250/

what I want to achieve is that if there is an exact query match it should
provide the results from q_e followed by results from partial match from q
field and if there is nothing in q_e field then partial matches should come
from q field.  This is how I specify the query

http://localhost:7983/zoom/jp/select/?q=鹿児島
鹿児島銀行rows=10version=2.2qf=query+query_exact^1mm=90%25pf=q^1+q_e^10
OR
version=2.2rows=10qf=q+q_e^1pf=query^10+query_exact^1

somehow the exact query matches results do not come on top, though the data
contains it. It is puzzling that all the documents do not get indexed
properly, but if I change the q field to string and q_e to text_ja then all
the records are indexed properly, but that still does not solve the problem
of exact match on top followed by partial matches.

text_ja field uses:
filter class=solr.JapaneseBaseFormFilterFactory/
filter class=solr.JapanesePartOfSpeechStopFilterFactory
tags=../../../solr/conf/lang/stoptags_ja.txt
enablePositionIncrements=true/
  filter class=solr.CJKWidthFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=../../../solr/conf/lang/stopwords_ja.txt
enablePositionIncrements=true /
 filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/
  filter class=solr.LowerCaseFilterFactory/

 How to solve this problem, 

Thanks










--
View this message in context: 
http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Japanese exact match results do not show on top of results

2012-12-20 Thread Robert Muir
I think you are hitting solr-3589. There is a vote underway for a 3.6.2
that contains this fix
On Dec 20, 2012 6:29 PM, kirpakaro khem...@yahoo.com wrote:

 Hi folks,

 I am having couple of problems with Japanese data, 1. it is not
 properly
 indexing all the data 2. displaying the exact match result on top and then
 90%match and 80%match etc. does not work.
  I am using solr3.6.1 and using text_ja as the fieldType here is the schema


field name=q type=text_ja indexed=true stored=true /
field name=qs type=text_general indexed=false stored=true
 multiValued=true/
field name=q_e type=string indexed=true stored=true /

  copyField source=q dest=q_e maxChars=250/

 what I want to achieve is that if there is an exact query match it should
 provide the results from q_e followed by results from partial match from q
 field and if there is nothing in q_e field then partial matches should come
 from q field.  This is how I specify the query

 http://localhost:7983/zoom/jp/select/?q=鹿児島
 鹿児島銀行rows=10version=2.2qf=query+query_exact^1mm=90%25pf=q^1+q_e^10
 OR
 version=2.2rows=10qf=q+q_e^1pf=query^10+query_exact^1

 somehow the exact query matches results do not come on top, though the data
 contains it. It is puzzling that all the documents do not get indexed
 properly, but if I change the q field to string and q_e to text_ja then all
 the records are indexed properly, but that still does not solve the problem
 of exact match on top followed by partial matches.

 text_ja field uses:
 filter class=solr.JapaneseBaseFormFilterFactory/
 filter class=solr.JapanesePartOfSpeechStopFilterFactory
 tags=../../../solr/conf/lang/stoptags_ja.txt
 enablePositionIncrements=true/
   filter class=solr.CJKWidthFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=../../../solr/conf/lang/stopwords_ja.txt
 enablePositionIncrements=true /
  filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/
   filter class=solr.LowerCaseFilterFactory/

  How to solve this problem,

 Thanks










 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: jconsole over jmx - should threads be visible?

2012-12-20 Thread Chris Hostetter

: If I connect jconsole to a remote Solr installation (or any app) using jmx,
: all the graphs are populated except 'threads' ... is this expected, or have I
: done something wrong?  I can't seem to locate the answer with google.

i just tried running the 4x solr example with the jetty options to allow 
remote JMX...

java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false  
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.port=1099 -jar start.jar 

...and was then able to monitor using jconsole and see all of the thread 
info as well fro ma remote machine.

-Hoss


Store document while using Solr

2012-12-20 Thread Nicholas Li
hi there,

I am quite new to Solr and have a very basic question about storing and
indexing the document.

I am trying with the Solr example, and when I run command like 'java -jar
post.jar foo/test.xml', it gives me the feeling that solr will index the
given file, no matter where it is store, and solr won't re-store this file
to some other location in the file system.  Am I correct?

If I want use file system to manage the document, it seem like it is better
to define some location, which will be used to store all the potential
files(It may need some processing to move/copy/upload the files to this
location), then use solr to index them under this location. Am I correct?

Cheers,
Nick


Re: Store document while using Solr

2012-12-20 Thread Otis Gospodnetic
Hi,

You can use Solr's DataImportHandler to index files in the file system.
 You could set things up in such a way that Solr keeps indexing whatever
you put in some specific location in the FS.  This is not the most common
setup, but it's certainly possible.  Solr keeps the searchable index in its
own directory defined in in one of its configs.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Thu, Dec 20, 2012 at 8:15 PM, Nicholas Li nicholas...@yarris.com wrote:

 hi there,

 I am quite new to Solr and have a very basic question about storing and
 indexing the document.

 I am trying with the Solr example, and when I run command like 'java -jar
 post.jar foo/test.xml', it gives me the feeling that solr will index the
 given file, no matter where it is store, and solr won't re-store this file
 to some other location in the file system.  Am I correct?

 If I want use file system to manage the document, it seem like it is better
 to define some location, which will be used to store all the potential
 files(It may need some processing to move/copy/upload the files to this
 location), then use solr to index them under this location. Am I correct?

 Cheers,
 Nick



RE: occasional GC crashes

2012-12-20 Thread Otis Gospodnetic
Hi Robi,

Oh that's the thing of the past, go for the latest Java 7 if they let you!

Otis
--
Performance Monitoring - http://sematext.com/spm
On Dec 20, 2012 6:29 PM, Petersen, Robert rober...@buy.com wrote:

 Hi Otis,

 I thought Java 7 had a bug which wasn't being addressed by Oracle which
 was making it not suitable for Solr.  Did that get fixed now?
 http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/

 I did see this but it doesn't really mention the bug:
 http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/

 Thanks
 Robi


 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Tuesday, December 18, 2012 5:25 PM
 To: solr-user@lucene.apache.org
 Subject: Re: occasional GC crashes

 Robert,

 Step 1 is to get the latest Java 7 or if you have to remain on 6 then use
 the latest 6.

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012
 7:54 PM, Petersen, Robert rober...@buy.com wrote:

   Hi solr user group,
 
  ** **
 
  Sorry if this isn't directly a Solr question.  Seems like once in a
  blue moon the GC crashes on a server in our Solr 3.6.1 slave farm.
  This seems to only happen on a couple of the twelve slaves we have
  deployed and only very rarely on those.  It seems like this doesn't
  directly affect solr because in the logs it looks like solr keeps
  working after the time of the exception but our external monitoring
  tool reports that the solr service is down so our operations department
 restarts solr on that box and alerts me.
  The solr logs show nothing unusual.  The exception does show up in the
  catalina.out log file though.  Does this happen to anyone else?  Here is
  the basic error and I have attached the crash dump file also.   Our total
  uptime on these boxes is over a year now BTW.
 
  ** **
 
  #
 
  # A fatal error has been detected by the Java Runtime Environment:
 
  #
 
  #  SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724,
  tid=1082353984
 
  #
 
  # JRE version: 6.0_25-b06
 
  # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
  linux-amd64 )
 
  # Problematic frame:
 
  # V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
  long)+0x82
 
  #
 
  # An error report file with more information is saved as:
 
  # /var/LucidWorks/lucidworks/hs_err_pid13724.log
 
  #
 
  # If you would like to submit a bug report, please visit:
 
  #   http://java.sun.com/webapps/bugreport/crash.jsp
 
  #
 
  ** **
 
  VM Arguments:
 
  jvm_args:
  -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf
  /logging.properties
  -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
  -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
  -Dcom.sun.management.jmxremote
  -Dcom.sun.management.jmxremote.ssl=false
  -Dcom.sun.management.jmxremote.authenticate=false
  -Dcom.sun.management.jmxremote.port=6060
  -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed
  -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat
  -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat
  -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp 
 
  java_command: org.apache.catalina.startup.Bootstrap -server
  -Dsolr.solr.home=lucidworks/solr start
 
  Launcher Type: SUN_STANDARD
 
  ** **
 
  Stack: [0x,0x],
  sp=0x40835eb0, free space=1056983k
 
  Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
  C=native
  code)
 
  V  [libjvm.so+0x3c4612]  Par_ConcMarkingClosure::trim_queue(unsigned
  long)+0x82
 
  V  [libjvm.so+0x3c481a]
  CMSConcMarkingTask::do_work_steal(int)+0xfa
 
  V  [libjvm.so+0x3c3dcf]  CMSConcMarkingTask::work(int)+0xef
 
  V  [libjvm.so+0x8783dc]  YieldingFlexibleGangWorker::loop()+0xbc
 
  V  [libjvm.so+0x8755b4]  GangWorker::run()+0x24
 
  V  [libjvm.so+0x71096f]  java_start(Thread*)+0x13f
 
  ** **
 
  Heap
 
  par new generation   total 345024K, used 180672K [0x2e12,
  0x2aaac578, 0x2aaac578)
 
eden space 306688K,  53% used [0x2e12,
  0x2aaab8243c28,
  0x2aaac0ca)
 
from space 38336K,  40% used [0x2aaac321,
  0x2aaac415c3f8,
  0x2aaac578)
 
to   space 38336K,   0% used [0x2aaac0ca, 0x2aaac0ca,
  0x2aaac321)
 
  concurrent mark-sweep generation total 33171072K, used 12144213K
  [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12)
 
  concurrent-mark-sweep perm gen total 83968K, used 50650K
  [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332)
 
  ** **
 
  Code Cache  [0x2b054000, 0x2b9a4000,
  0x2e054000)**
  **
 
  total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512
  largest_free_block=15808
 
  ** **
 
  ** **
 
  ** **
 
  

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-20 Thread Otis Gospodnetic
Hi,

Have a look at http://search-lucene.com/?q=invalid+version+javabin

Otis
--
Solr Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson shah...@checkpoint.comwrote:

 Hi,

 I'm encountering this error randomly when running a distributed facet.
  (i.e. I'm sending the exact same request, yet this does not reproduce
 consistently)
 I have about  180 shards that are being queried.
 It seems that when Solr distributes the request to the shards one , or
 perhaps more, shards return an  XML reply instead of  Javabin.

 I added some debug output to JavaBinCode.unmarshal  (as done in the
 debugging.patch of SOLR-3258) to check whether the XML reply holds an error
 or not, and I noticed that the XML actually holds the response from one of
 the shards.

 I'm using the patch provided in SOLR-2894 on top of trunk 1404975.

 Has anyone encountered such an issue? Any ideas?

 Thanks,

 Shahar.



Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Alexandre Rafalovitch
Thank you.

So, the conclusion to me is that @name can be skipped. It is not used in
anything (or anything critical anyway) and there is a default. That's good
enough for me.

On another hand, having @version default to 1.0 is probably an oversight,
given the number of changes present Should it not default to latest or
at least to 1.5 (and change periodically)?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Dec 21, 2012 at 7:50 AM, Jack Krupansky j...@basetechnology.comwrote:

 Yeah... not sure how I missed it, but my search sees it now.

 Also, the name will default to schema.xml is you do leave it out of the
 schema.

 -- Jack Krupansky

 -Original Message- From: Mikhail Khludnev
 Sent: Thursday, December 20, 2012 3:06 PM
 To: solr-user
 Subject: Re: Where does schema.xml's schema/@name displays?


 Jack,
 FWIW I've found occurrence in SystemInfoHandler.java


 On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  I checked the 4.x source code and except for the fact that you will get a
 warning if you leave it out, nothing uses that name. But... that's not to
 say that a future release might not require it - the doc/comments don't
 explicitly say that it is optional.

 Note that the version attribute is optional (as per the source code, but
 no mention in doc/comments) and defaults to 1.0, with no warning.

 -- Jack Krupansky


 -Original Message- From: Alexandre Rafalovitch
 Sent: Thursday, December 20, 2012 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Where does schema.xml's schema/@name displays?

 Hello,

 In the schema.xml, we have a name attribute on the root note. The
 documentation says it is for display purpose only. But for display where?

 It seems that the admin console uses the name in the solr.xml file
 instead.
 And deleting the name attribute does not seem to cause any problems
 either.

 The reason I ask is because I am writing an explanation example which
 involves schema.xml config file being copied and modified over and over
 again. If @name is significant, I need to mention changing it. If not, I
 will just delete it all together.

 Regards,
   Alex.


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/alexandrerafalovitchhttp://www.linkedin.com/in/**alexandrerafalovitch
 http://**www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Chris Hostetter

: On another hand, having @version default to 1.0 is probably an oversight,
: given the number of changes present Should it not default to latest or
: at least to 1.5 (and change periodically)?

If the default value changed, then users w/o a version attribute in their 
schema would suddenly get very different behavior if they upgraded from 
one version of solr the the next.


-Hoss


Re: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Alexandre Rafalovitch
I agree actually (about not surprising the users). But the consequences of
forgetting this value may also lead to some serious debugging issues.

An interesting (not sure if reasonable) compromise would be to look at an
error message for @version=1 and using @multiValued attribute and make sure
it actually complains if it sees such combination and that the message
explicitly say What's your @version value? Maybe it needs to be
explicit/more recent. Same with autoGeneratePhraseQueries and @version=1.4.

Then, somebody patching together a config file from multiple sources will
be guided in the right direction.

Just a newbie-oriented thought. I am sure, there are other more-processing
things on a pipeline.

Regards,
 Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Dec 21, 2012 at 4:49 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : On another hand, having @version default to 1.0 is probably an oversight,
 : given the number of changes present Should it not default to latest
 or
 : at least to 1.5 (and change periodically)?

 If the default value changed, then users w/o a version attribute in their
 schema would suddenly get very different behavior if they upgraded from
 one version of solr the the next.


 -Hoss



Reply:How to add the extra analyzer jar?

2012-12-20 Thread SuoNayi
The issue has been solved and sorry for my negligence.




At 2012-12-21 11:10:53,SuoNayi suonayi2...@163.com wrote:
Hi all, for solrcloud(solr 4.0) how to add the third analyzer?
There is a third analyzer jar and I want to integrate it with solrcloud.
Here are my steps but the ClassNotFoundException is thrown at last when 
startup.
1.add the fieldType in the schema.xml and here is a snippet :
!-- ikanalyzer --
fieldType name=text_ik class=solr.TextField 
analyzer class=org.wltea.analyzer.lucene.IKAnalyzer/
analyzer type=index
tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory 
isMaxWordLength=false/  
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/  
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/  
filter class=solr.LowerCaseFilterFactory/  
filter class=solr.EnglishPossessiveFilterFactory 
protected=protwords.txt/  
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query  
tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory 
isMaxWordLength=false/  
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/  
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/  
filter class=solr.EnglishPossessiveFilterFactory 
protected=protwords.txt/  
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer  
/fieldType
2. add the IKAnalyzer.cfg.xml and stopword.dic files into the classes 
directory of the solr.war(open the war and add those two files).
3.use the start.jar to start up and the ClassNotFoundException is thrown.


Could some help me to figure out what's wrong or tell me where I can add the 
extra/third jar lib into the classpath of the solrcloud?


Thanks,


SuoNayi


Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.

2012-12-20 Thread Shinichiro Abe
You can place the missing JAR files in the contrib/extraction/lib.

For class files: asm-x.x.jar
For mp4 files: aspectjrt-x.x.jar

FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209

Regards,
Shinichiro Abe

On 2012/12/21, at 15:08, Shigeki Kobayashi wrote:

 Hi,
 
 I use ManifoldCF1.1dev to crawl files and index them into Solr4.0
 
 While indexing class files and mp4 files, Solr caused NoClassDefFoundError
 as
 following:
 
 Indexing a mp4 file
 
 2012-12-19
 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException:
 java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
 org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
at
 org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
at
 org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
at
 org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
at
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
... 18 more
 Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 29 more
 
 --
 Indexing a class file
 
 2012-12-19
 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException:
 java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor
at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 

Which token filter can combine 2 terms into 1?

2012-12-20 Thread Xi Shen
Hi,

I am looking for a token filter that can combine 2 terms into 1? E.g.

the input has been tokenized by white space:

t1 t2 t2a t3

I want a filter that output:

t1 t2t2a t3

I know it is a very special case, and I am thinking about develop a filter
of my own. But I cannot figure out which API I should use to look for terms
in a Token Stream.


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84


Solr 3.5: java.lang.NegativeArraySizeException caused by negative start value

2012-12-20 Thread Shawn Heisey

This is on Solr 3.5.0.

We are getting a java.lang.NegativeArraySizeException when our webapp 
sends a query where the start parameter is set to a negative value. 
This seems to set off a denial of service problem within Solr.  I don't 
yet know whether it's a mistake in coding, or whether some malicious 
user has found an attack vector on our site.


After the first exception, another exception 
(org.mortbay.jetty.EofException) appears in the logs with increasing 
frequency.  Within minutes of the first exception, the load balancer 
complains about having no servers available because ping requests are 
failing.


This is distributed search, but the shards parameter is in 
solrconfig.xml, not provided by the client.


Full exception:

Dec 20, 2012 7:41:34 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NegativeArraySizeException
at 
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:108)
at 
org.apache.solr.handler.component.ShardFieldSortedHitQueue.init(ShardDoc.java:139)
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:712)
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:571)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:550)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



Later exceptions:

Dec 21, 2012 12:24:37 AM org.apache.solr.common.SolrException log
SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)

at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at 
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 

Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.

2012-12-20 Thread Shigeki Kobayashi
Thanks Abe-san!

Your advice is very informative.

Thanks again.


Regards,

Shigeki


2012/12/21 Shinichiro Abe shinichiro.ab...@gmail.com

 You can place the missing JAR files in the contrib/extraction/lib.

 For class files: asm-x.x.jar
 For mp4 files: aspectjrt-x.x.jar

 FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209

 Regards,
 Shinichiro Abe

 On 2012/12/21, at 15:08, Shigeki Kobayashi wrote:

  Hi,
 
  I use ManifoldCF1.1dev to crawl files and index them into Solr4.0
 
  While indexing class files and mp4 files, Solr caused
 NoClassDefFoundError
  as
  following:
 
  Indexing a mp4 file
 
  2012-12-19
 
 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException:
  java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
  org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
 at
  org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
 at
 org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
 at
 
 org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
 at
 
 org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
 at
 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
 at java.lang.Thread.run(Thread.java:662)
  Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
 at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117)
 at
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at
  org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at
 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 ... 18 more
  Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at
 java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 ... 29 more
 
  --
  Indexing a class file
 
  2012-12-19
 
 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException:
  java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at