Re: PostingsFormat block size

2015-01-28 Thread Trym Møller

Hi

Thanks for your input.

I do not do updates to the existing docs, so that is not relevant in my 
case, and I have just skipped that test case :-)
I have not been able to measure any significant changes to the 
distributed searches or just doing a direct search for an id.


Did I miss something with your comment "Here it is"?

Best regards Trym

On 27-01-2015 17:22, Mikhail Khludnev wrote:

Hm.. It's not blocks which I'm familiar with. Regarding performance impact
from bigger ID blocks: if you have ID and sends
update for existing docs. And IDs are also used for some of the distributed
search stages, I suppose. Here it is.

On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller  wrote:


Hi

Thanks for your clarifying questions.

In the constructor of the Lucene41PostingsFormat class the minimum and
maximum block size is provided. These sizes are used when creating the
BlockTreeTermsWriter (responsible for writing the .tim and .tip files of
the lucene index). It is the blocksizes of the BlockTreeTermsWriter I refer
to.

I'm not quite sure I understand your second question - sorry.
I can tell that I have not tried if the PulsingPostingsFormat is of any
help in regards to lowering the Solr JVM Memory usage, but I can see the
same BlockTreeTermsWriter with its block sizes are used by the
PulsingPostingsFormat.
Should I expect something else from the PulsingPostingsFormat in regards
to memory usage or in regards to searching (if have have changed to block
sizes of the BlockTreeTermsWriter)?

Best regards Trym


On 27-01-2015 14:00, Mikhail Khludnev wrote:


Hello Trym,

Can you clarify, which blockSize do you mean? And the second q, just to
avoid unnecessary explanation, do you know what's Pulsing?

On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller  wrote:

  Hi

I have successfully create a really cool Lucene41x8PostingsFormat class
(a
copy of the Lucene41PostingsFormat class modified to use 8 times the
default block size), registered the format as required. In the
schema.xml I
have created a field type string with this postingsformat and lastly I'm
using this field type for my id field. This all works great and as a
consequence the .tip files of the Lucene index (segments) are
considerably
smaller and the same goes for the Solr JVM Memory usage (which was the
end
goal).

Now I need to find the consequences (besides the disk and memory usage)
of
this change to the id-field. I would expect that id-searches are slower.
But when will Solr/Lucene do id-searches? I have myself no user scenarios
where my documents are searched by the id value.

Thanks for any comments.

Best regards Trym









Define Id when using db dih

2015-01-28 Thread SolrUser1543
Hi,  

I am using data import handler and import data from oracle db. 
I have a problem that the table I am importing from has no one column which
is defined as a key. 
How should I define the key in the data config file ?

Thanks 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Define-Id-when-using-db-dih-tp4182797.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
BTW:
None of my core folders contains a core.properties file ... ? Could it be due 
to the fact that I am (so far) running only EmbeddedSolrServer, hence no real 
Solr-Server?

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Donnerstag, 29. Januar 2015 08:08
An: solr-user@lucene.apache.org
Betreff: AW: CoreContainer#createAndLoad, existing cores not loaded

Thx Shawn. I am running latest-greatest Solr (4.10.3) Solr home is e.g.
/opt/webs//WebContent/WEB-INF/solr
the core(s) reside in
/opt/webs//WebContent/WEB-INF/solr/cores
Should these be found by core discovery? 
If not, how can I configure coreRootDirectory in sorl.xml to be "cores folder 
below "

${coreRootDirectory:/cores}

Note:
the solr.xml is to be used for any of our 150sites we host. Therefore like it 
to be "generic" -> /cores

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org]
Gesendet: Mittwoch, 28. Januar 2015 17:08
An: solr-user@lucene.apache.org
Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded

On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
> My problem:
> I create cores dynamically using container#create( CoreDescriptor ) and then 
> add documents to the very core(s). So far so good.
> When I restart my app I do
> container = CoreContainer#createAndLoad(...) but when I then call
> container.getAllCoreNames() an empty list is returned.
>
> What cores should be loaded by the container if I call
> CoreContainer#createAndLoad(...)
> ? Where does the container lookup the "existing cores"?

If the solr.xml is the old format, then cores are defined in solr.xml, in the 
 section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and later 
and will be mandatory in 5.0.  If that format is present, then Solr will use 
core discovery -- starting from either the solr home or a defined 
coreRootDirectory, solr will search for core.properties files and treat each 
directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
Thx Shawn. I am running latest-greatest Solr (4.10.3)
Solr home is e.g.
/opt/webs//WebContent/WEB-INF/solr
the core(s) reside in
/opt/webs//WebContent/WEB-INF/solr/cores
Should these be found by core discovery? 
If not, how can I configure coreRootDirectory in sorl.xml to be "cores folder 
below "

${coreRootDirectory:/cores}

Note:
the solr.xml is to be used for any of our 150sites we host. Therefore like it 
to be "generic" -> /cores

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Mittwoch, 28. Januar 2015 17:08
An: solr-user@lucene.apache.org
Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded

On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
> My problem:
> I create cores dynamically using container#create( CoreDescriptor ) and then 
> add documents to the very core(s). So far so good.
> When I restart my app I do
> container = CoreContainer#createAndLoad(...) but when I then call 
> container.getAllCoreNames() an empty list is returned.
>
> What cores should be loaded by the container if I call
> CoreContainer#createAndLoad(...)
> ? Where does the container lookup the "existing cores"?

If the solr.xml is the old format, then cores are defined in solr.xml, in the 
 section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and later 
and will be mandatory in 5.0.  If that format is present, then Solr will use 
core discovery -- starting from either the solr home or a defined 
coreRootDirectory, solr will search for core.properties files and treat each 
directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Re: Reading data from another solr core

2015-01-28 Thread solrk
Thank you Alvaro Cabrerizo! I am going to give a shot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466p4182758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-01-28 Thread vsriram30
Thanks Shawn.  Not sure whether I will be able to test it out with 4.10.3.  I
will try the workarounds and update.

Thanks,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4182757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexFormatTooNewException

2015-01-28 Thread Shawn Heisey
On 1/28/2015 2:51 PM, Joshi, Shital wrote:
> Thank you for replying. 
>
> We added new shard to same cluster where some shards are showing Solr version 
> 4.10.0 and this new shard is showing Solr version 4.8.0. All shards source 
> Solr software from same location and use same start up script. I am surprised 
> how older shards are still running Solr 4.10.0.
>
> How we do real downgrade index to 4.8? You mean replay all data? 

It is often not enough to simply replace the solr war.  You may also
need to wipe out the extracted war before restarting, or jars from the
previous version may still exist and some of them might be loaded
instead of the new version.

If you're using the jetty included in the example, the war is in the
webapps directory and the extracted files are under solr-webapp.  If
you're using another container, then I have no idea where the war gets
extracted.

If any index segments were written by the 4.10 version, they will not be
readable after downgrading to the 4.8 version.  Wiping out the index and
rebuilding it from scratch is usually the only way to fix that situation.

Thanks,
Shawn



RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread fabio.bozzo
I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for "go pro" is good:*

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "go pro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485581792"
}
  },
  "response": {
"numFound": 27,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"id": "27407",
"marchio": "GO PRO",
"barcode_interno_s": "185323000958",
"prezzo_acquisto_d": 16.12,
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"descrizione": "BATTERIA GO PRO HERO ",
"prezzo_vendita_d": 39.9,
"categoria": "Batterie",
"_version_": 1491583424191791000
  },

 

]
  },
  "spellcheck": {
"suggestions": [
  "go pro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 6,
"origFreq": 433,
"suggestion": [
  {
"word": "gopro",
"freq": 2
  }
]
  },
  "correctlySpelled",
  false,
  "collation",
  [
"collationQuery",
"gopro",
"hits",
3,
"misspellingsAndCorrections",
[
  "go pro",
  "gopro"
]
  ]
]
  }
}

While querying for "gopro" is not:

{
  "responseHeader": {
"status": 0,
"QTime": 6,
"params": {
  "q": "gopro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485629480"
}
  },
  "response": {
"numFound": 3,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK0030010",
"codice_s": "5.VID.39163",
"id": "38814",
"marchio": "GO PRO",
"barcode_interno_s": "818279012477",
"prezzo_acquisto_d": 150.84,
"data_aggiornamento_dt": "2014-12-24T00:00:00Z",
"descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
"prezzo_vendita_d": 219,
"categoria": "Fotografia",
"_version_": 1491583425479442400
  },

]
  },
  "spellcheck": {
"suggestions": [
  "gopro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 5,
"origFreq": 2,
"suggestion": [
  {
"word": "giro",
"freq": 6
  }
]
  },
  "correctlySpelled",
  false
]
  }
}

---

I'd like "go pro" as a suggestion for "gopro" too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: IndexFormatTooNewException

2015-01-28 Thread Joshi, Shital
Thank you for replying. 

We added new shard to same cluster where some shards are showing Solr version 
4.10.0 and this new shard is showing Solr version 4.8.0. All shards source Solr 
software from same location and use same start up script. I am surprised how 
older shards are still running Solr 4.10.0.

How we do real downgrade index to 4.8? You mean replay all data? 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, January 28, 2015 4:10 PM
To: solr-user@lucene.apache.org
Subject: Re: IndexFormatTooNewException


: We upgraded our cluster to Solr 4.10.0 for couple days and again 
: reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. 
: Do you know why?

because you didn't fully revert - you are still running Solr 4.10.0 - the 
details of what steps you took to try and switch back make a huge 
differnet in understanding why you are still running .0 even though you 
don't want to.


: We recently added new shards to our cluster and dashboard shows correct 
: Solr version (4.8.0) for these new shards. We copied index from one of 
: old shards (where it is showing 4.10.0 on dashboard) to this new shard 
: and we see this error upon start up. How do we get rid of this error?

IndexFormatTooNewException means exactly what it sounds like -- you are 
asking Solr/Lucene to open an index that it can tell was created by a 
newer version of the software and it is incapable of doing so.

You either need to upgrade all of the nodes to 4.10, or you need to scrap 
this index, do a *real* downgrade to 4.8, and then rebuild your index (or 
restore a backup index from before you attempted to upgrade.

: Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path="/local/data/solr13/index.20140919180209018/segments_1tzz"))):
 3 (needs to be between 0 and 2)
: at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
: at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
: at org.apache.lucene.index.IndexWriter.(IndexWriter.java:749)
: 
: 
: 

-Hoss
http://www.lucidworks.com/


Re: replica never takes leader role

2015-01-28 Thread Mark Miller
Yes, after 45 seconds a replica should take over as leader. It should
likely explain in the logs of the replica that should be taking over why
this is not happening.

- Mar

On Wed Jan 28 2015 at 2:52:32 PM Joshi, Shital  wrote:

> When leader reaches 99% physical memory on the box and starts swapping
> (stops replicating), we forcefully bring down leader (first kill -15 and
> then kill -9 if kill -15 doesn't work). This is when we are looking up to
> replica to assume leader's role and it never happens.
>
> Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and
> test.
>
>  host="${host:}" hostPort="${jetty.port:8983}" 
> hostContext="${hostContext:solr}"
> zkClientTimeout="${zkClientTimeout:45000}">
>
> As per definition of zkClientTimeout, After the leader is brought down and
> it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica
> to leader? I am not sure how increasing zk timeout will help.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, January 28, 2015 11:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: replica never takes leader role
>
> This is not the desired behavior at all. I know there have been
> improvements in this area since 4.8, but can't seem to locate the JIRAs.
>
> I'm curious _why_ the nodes are going down though, is it happening at
> random or are you taking it down? One problem has been that the Zookeeper
> timeout used to default to 15 seconds, and occasionally a node would be
> unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
> the ZK timeout has helped some people avoid this...
>
> FWIW,
> Erick
>
> On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital 
> wrote:
>
> > We're using Solr 4.8.0
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, January 27, 2015 7:47 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: replica never takes leader role
> >
> > What version of Solr? This is an ongoing area of improvements and several
> > are very recent.
> >
> > Try searching the JIRA for Solr for details.
> >
> > Best,
> > Erick
> >
> > On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital 
> > wrote:
> >
> > > Hello,
> > >
> > > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and
> three
> > > zookeeper instances. We have noticed that when a leader node goes down
> > the
> > > replica never takes over as a leader, cloud becomes unusable and we
> have
> > to
> > > bounce entire cloud for replica to assume leader role. Is this default
> > > behavior? How can we change this?
> > >
> > > Thanks.
> > >
> > >
> > >
> >
>


Re: Reindex data without creating new index.

2015-01-28 Thread SolrUser1543
By rebalancing I mean that such a big amount of updates will create a
situation which will require running optimization of index ,because each
document will be added again, instead of original one. 

But according to what you say it is should not be a problem, am I correct? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464p4182726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexFormatTooNewException

2015-01-28 Thread Chris Hostetter

: We upgraded our cluster to Solr 4.10.0 for couple days and again 
: reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. 
: Do you know why?

because you didn't fully revert - you are still running Solr 4.10.0 - the 
details of what steps you took to try and switch back make a huge 
differnet in understanding why you are still running .0 even though you 
don't want to.


: We recently added new shards to our cluster and dashboard shows correct 
: Solr version (4.8.0) for these new shards. We copied index from one of 
: old shards (where it is showing 4.10.0 on dashboard) to this new shard 
: and we see this error upon start up. How do we get rid of this error?

IndexFormatTooNewException means exactly what it sounds like -- you are 
asking Solr/Lucene to open an index that it can tell was created by a 
newer version of the software and it is incapable of doing so.

You either need to upgrade all of the nodes to 4.10, or you need to scrap 
this index, do a *real* downgrade to 4.8, and then rebuild your index (or 
restore a backup index from before you attempted to upgrade.

: Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path="/local/data/solr13/index.20140919180209018/segments_1tzz"))):
 3 (needs to be between 0 and 2)
: at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
: at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
: at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
: at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
: at org.apache.lucene.index.IndexWriter.(IndexWriter.java:749)
: 
: 
: 

-Hoss
http://www.lucidworks.com/


IndexFormatTooNewException

2015-01-28 Thread Joshi, Shital
Hi,

We upgraded our cluster to Solr 4.10.0 for couple days and again reverted back 
to 4.8.0. However the dashboard still shows Solr 4.10.0. Do you know why?
*   solr-spec 4.10.0
*   solr-impl 4.10.0 1620776
*   lucene-spec 4.10.0
*   lucene-impl 4.10.0 1620776

We recently added new shards to our cluster and dashboard shows correct Solr 
version (4.8.0) for these new shards. We copied index from one of old shards 
(where it is showing 4.10.0 on dashboard) to this new shard and we see this 
error upon start up. How do we get rid of this error?

Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version 
is not supported (resource: 
BufferedChecksumIndexInput(MMapIndexInput(path="/local/data/solr13/index.20140919180209018/segments_1tzz"))):
 3 (needs to be between 0 and 2)
at 
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:749)




RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
When leader reaches 99% physical memory on the box and starts swapping (stops 
replicating), we forcefully bring down leader (first kill -15 and then kill -9 
if kill -15 doesn't work). This is when we are looking up to replica to assume 
leader's role and it never happens. 

Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. 



As per definition of zkClientTimeout, After the leader is brought down and it 
doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to 
leader? I am not sure how increasing zk timeout will help. 

 
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 28, 2015 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital  wrote:

> We're using Solr 4.8.0
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, January 27, 2015 7:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replica never takes leader role
>
> What version of Solr? This is an ongoing area of improvements and several
> are very recent.
>
> Try searching the JIRA for Solr for details.
>
> Best,
> Erick
>
> On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital 
> wrote:
>
> > Hello,
> >
> > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
> > zookeeper instances. We have noticed that when a leader node goes down
> the
> > replica never takes over as a leader, cloud becomes unusable and we have
> to
> > bounce entire cloud for replica to assume leader role. Is this default
> > behavior? How can we change this?
> >
> > Thanks.
> >
> >
> >
>


Re: replicas goes in recovery mode right after update

2015-01-28 Thread Erick Erickson
Vijay:

Thanks for reporting this back!  Could I ask you to post a new patch with
your correction? Please use the same patch name
(SOLR-5850.patch), and include a note about what you found (I've already
added a comment).

Thanks!
Erick

On Wed, Jan 28, 2015 at 9:18 AM, Vijay Sekhri  wrote:

> Hi Shawn,
> Thank you so much for the assistance. Building is not a problem . Back in
> the days I have worked with linking, compiling and  building C , C++
> software . Java is a piece of cake.
> We have built the new war from the source version 4.10.3 and our
> preliminary tests have shown that our issue (replicas in recovery on high
> load)* is resolved *. We will continue to do more testing and confirm .
> Please note that the *patch is BUGGY*.
>
> It removed the break statement within while loop because of which, whenever
> we send a list of docs it would hang (API CloudSolrServer.add) , but it
> would work if send one doc at a time.
>
> It took a while to figure out why that is happening. Once we put the break
> statement back it worked like a charm.
> Furthermore the patch has
>
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java
> which should be
>
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java
>
> Finally checking if(!offer) is sufficient than using if(offer == false)
> Last but not the least having a configurable queue size and timeouts
> (managed via solrconfig) would be quite helpful
> Thank you once again for your help.
>
> Vijay
>
> On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey  wrote:
>
> > On 1/27/2015 2:52 PM, Vijay Sekhri wrote:
> > > Hi Shawn,
> > > Here is some update. We found the main issue
> > > We have configured our cluster to run under jetty and when we tried
> full
> > > indexing, we did not see the original Invalid Chunk error. However the
> > > replicas still went into recovery
> > > All this time we been trying to look into replicas logs to diagnose the
> > > issue. The problem seem to be at the leader side. When we looked into
> > > leader logs, we found the following on all the leaders
> > >
> > > 3439873 [qtp1314570047-92] WARN
> > >  org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
> > > sending update
> > > *java.lang.IllegalStateException: Queue full*
> >
> > 
> >
> > > There is a similar bug reported around this
> > > https://issues.apache.org/jira/browse/SOLR-5850
> > >
> > > and it seem to be in OPEN status. Is there a way we can configure the
> > queue
> > > size and increase it ? or is there a version of solr that has this
> issue
> > > resolved already?
> > > Can you suggest where we go from here to resolve this ? We can repatch
> > the
> > > war file if that is what you would recommend .
> > > In the end our initial speculation about solr unable to handle so many
> > > update is correct. We do not see this issue when the update load is
> less.
> >
> > Are you in a position where you can try the patch attached to
> > SOLR-5850?  You would need to get the source code for the version you're
> > on (or perhaps a newer 4.x version), patch it, and build Solr yourself.
> > If you have no experience building java packages from source, this might
> > prove to be difficult.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> *
> Vijay Sekhri
> *
>


Issue on server restarts with Solr 4.6.0 Cloud

2015-01-28 Thread andrew jenner
Using Solr 4.6.0 on linux with Java 6 (Oracle JRockit 1.6.0_75
R28.3.2-14-160877-1.6.0_75)


We are seeing these issues when doing a restart on a Solr cloud
configuration.After restarting each server in sequence none of them
will come up. The servers start up after a long time but the cloud
status shows the Solr as being down.


java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:87)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:603)
at 
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:778)
at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at 
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:71)
at 
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at 
org.apache.solr.update.TransactionLog$FSReverseReader.(TransactionLog.java:696)
at 
org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:575)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:942)
at 
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:885)
at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1043)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:280)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


SnapPull failed :org.apache.lucene.store.AlreadyClosedException: Already closed
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:340)
at 
org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:811)
at 
org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:564)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:506)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322)
at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:433)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


Error while trying to recover.
core=[REDACTED]:org.apache.solr.common.SolrException: No registered
leader was found, collection:[REDACTED] slice:shard1
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)


Re: extract and add fields on the fly

2015-01-28 Thread Mark
Thanks Alexandre,

I figured it out with this example,

https://wiki.apache.org/solr/ExtractingRequestHandler

whereby you can add additional fields at upload/extract time

curl "
http://localhost:8983/solr/update/extract?literal.id=doc4&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_txt&boost.foo_txt=3&literal.blah_s=Bah";
-F "tutorial=@"help.pdf

and therefore I learned that you can't update a field that isn't in the
original which is what I was trying to do before.

Regards

Mark



On 28 January 2015 at 18:38, Alexandre Rafalovitch 
wrote:

> Well, the schema does need to know what type your field is. If you
> can't add it to schema, use dynamicFields with prefixe/suffixes or
> dynamic schema (less recommended).
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 28 January 2015 at 13:32, Mark  wrote:
> > That approach works although as suspected the schma has to recognise the
> > additinal facet (stuff in this case):
> >
> > "responseHeader":{"status":400,"QTime":1},"error":{"msg":"ERROR:
> > [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
> > 'stuff'","code":400}}
> >
> > ..getting closer..
> >
> > On 28 January 2015 at 18:03, Mark  wrote:
> >
> >>
> >> Use case is
> >>
> >> use curl to upload/extract/index document passing in additional facets
> not
> >> present in the document e.g. literal.source="old system"
> >>
> >> In this way some fields come from the uploaded extracted content and
> some
> >> fields as specified in the curl URL
> >>
> >> Hope that's clearer?
> >>
> >> Regards
> >>
> >> Mark
> >>
> >>
> >> On 28 January 2015 at 17:54, Alexandre Rafalovitch 
> >> wrote:
> >>
> >>> Sounds like 'literal.X' syntax from
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
> >>>
> >>> Can you explain your use case as different from what's already
> >>> documented? May be easier to understand.
> >>>
> >>> Regards,
> >>>Alex.
> >>> 
> >>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> >>>
> >>>
> >>> On 28 January 2015 at 12:45, Mark  wrote:
> >>> > I'm looking to
> >>> >
> >>> > 1) upload a binary document using curl
> >>> > 2) add some additional facets
> >>> >
> >>> > Specifically my question is can this be achieved in 1 curl operation
> or
> >>> > does it need 2?
> >>> >
> >>> > On 28 January 2015 at 17:43, Mark  wrote:
> >>> >
> >>> >>
> >>> >> Second thoughts SID is purely i/p as its name suggests :)
> >>> >>
> >>> >> I think a better approach would be
> >>> >>
> >>> >> 1) curl to upload/extract passing docID
> >>> >> 2) curl to update additional fields for that docID
> >>> >>
> >>> >>
> >>> >>
> >>> >> On 28 January 2015 at 17:30, Mark  wrote:
> >>> >>
> >>> >>>
> >>> >>> "Create the SID from the existing doc" implies that a document
> already
> >>> >>> exists that you wish to add fields to.
> >>> >>>
> >>> >>> However if the document is a binary are you suggesting
> >>> >>>
> >>> >>> 1) curl to upload/extract passing docID
> >>> >>> 2) obtain a SID based off docID
> >>> >>> 3) add addtinal fields to SID & commit
> >>> >>>
> >>> >>> I know I'm possibly wandering into the schemaless teritory here as
> >>> well
> >>> >>>
> >>> >>>
> >>> >>> On 28 January 2015 at 17:11, Andrew Pawloski 
> >>> wrote:
> >>> >>>
> >>>  I would switch the order of those. Add the new fields and *then*
> >>> index to
> >>>  solr.
> >>> 
> >>>  We do something similar when we create SolrInputDocuments that are
> >>> pushed
> >>>  to solr. Create the SID from the existing doc, add any additional
> >>> fields,
> >>>  then add to solr.
> >>> 
> >>>  On Wed, Jan 28, 2015 at 11:56 AM, Mark 
> wrote:
> >>> 
> >>>  > Is it possible to use curl to upload a document (for extract &
> >>>  indexing)
> >>>  > and specify some fields on the fly?
> >>>  >
> >>>  > sort of:
> >>>  > 1) index this document
> >>>  > 2) by the way here are some important facets whilst your at it
> >>>  >
> >>>  > Regards
> >>>  >
> >>>  > Mark
> >>>  >
> >>> 
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>
> >>
>


Re: extract and add fields on the fly

2015-01-28 Thread Alexandre Rafalovitch
Well, the schema does need to know what type your field is. If you
can't add it to schema, use dynamicFields with prefixe/suffixes or
dynamic schema (less recommended).

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 28 January 2015 at 13:32, Mark  wrote:
> That approach works although as suspected the schma has to recognise the
> additinal facet (stuff in this case):
>
> "responseHeader":{"status":400,"QTime":1},"error":{"msg":"ERROR:
> [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
> 'stuff'","code":400}}
>
> ..getting closer..
>
> On 28 January 2015 at 18:03, Mark  wrote:
>
>>
>> Use case is
>>
>> use curl to upload/extract/index document passing in additional facets not
>> present in the document e.g. literal.source="old system"
>>
>> In this way some fields come from the uploaded extracted content and some
>> fields as specified in the curl URL
>>
>> Hope that's clearer?
>>
>> Regards
>>
>> Mark
>>
>>
>> On 28 January 2015 at 17:54, Alexandre Rafalovitch 
>> wrote:
>>
>>> Sounds like 'literal.X' syntax from
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>>>
>>> Can you explain your use case as different from what's already
>>> documented? May be easier to understand.
>>>
>>> Regards,
>>>Alex.
>>> 
>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>
>>>
>>> On 28 January 2015 at 12:45, Mark  wrote:
>>> > I'm looking to
>>> >
>>> > 1) upload a binary document using curl
>>> > 2) add some additional facets
>>> >
>>> > Specifically my question is can this be achieved in 1 curl operation or
>>> > does it need 2?
>>> >
>>> > On 28 January 2015 at 17:43, Mark  wrote:
>>> >
>>> >>
>>> >> Second thoughts SID is purely i/p as its name suggests :)
>>> >>
>>> >> I think a better approach would be
>>> >>
>>> >> 1) curl to upload/extract passing docID
>>> >> 2) curl to update additional fields for that docID
>>> >>
>>> >>
>>> >>
>>> >> On 28 January 2015 at 17:30, Mark  wrote:
>>> >>
>>> >>>
>>> >>> "Create the SID from the existing doc" implies that a document already
>>> >>> exists that you wish to add fields to.
>>> >>>
>>> >>> However if the document is a binary are you suggesting
>>> >>>
>>> >>> 1) curl to upload/extract passing docID
>>> >>> 2) obtain a SID based off docID
>>> >>> 3) add addtinal fields to SID & commit
>>> >>>
>>> >>> I know I'm possibly wandering into the schemaless teritory here as
>>> well
>>> >>>
>>> >>>
>>> >>> On 28 January 2015 at 17:11, Andrew Pawloski 
>>> wrote:
>>> >>>
>>>  I would switch the order of those. Add the new fields and *then*
>>> index to
>>>  solr.
>>> 
>>>  We do something similar when we create SolrInputDocuments that are
>>> pushed
>>>  to solr. Create the SID from the existing doc, add any additional
>>> fields,
>>>  then add to solr.
>>> 
>>>  On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
>>> 
>>>  > Is it possible to use curl to upload a document (for extract &
>>>  indexing)
>>>  > and specify some fields on the fly?
>>>  >
>>>  > sort of:
>>>  > 1) index this document
>>>  > 2) by the way here are some important facets whilst your at it
>>>  >
>>>  > Regards
>>>  >
>>>  > Mark
>>>  >
>>> 
>>> >>>
>>> >>>
>>> >>
>>>
>>
>>


Re: extract and add fields on the fly

2015-01-28 Thread Mark
That approach works although as suspected the schma has to recognise the
additinal facet (stuff in this case):

"responseHeader":{"status":400,"QTime":1},"error":{"msg":"ERROR:
[doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field
'stuff'","code":400}}

..getting closer..

On 28 January 2015 at 18:03, Mark  wrote:

>
> Use case is
>
> use curl to upload/extract/index document passing in additional facets not
> present in the document e.g. literal.source="old system"
>
> In this way some fields come from the uploaded extracted content and some
> fields as specified in the curl URL
>
> Hope that's clearer?
>
> Regards
>
> Mark
>
>
> On 28 January 2015 at 17:54, Alexandre Rafalovitch 
> wrote:
>
>> Sounds like 'literal.X' syntax from
>>
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>>
>> Can you explain your use case as different from what's already
>> documented? May be easier to understand.
>>
>> Regards,
>>Alex.
>> 
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 28 January 2015 at 12:45, Mark  wrote:
>> > I'm looking to
>> >
>> > 1) upload a binary document using curl
>> > 2) add some additional facets
>> >
>> > Specifically my question is can this be achieved in 1 curl operation or
>> > does it need 2?
>> >
>> > On 28 January 2015 at 17:43, Mark  wrote:
>> >
>> >>
>> >> Second thoughts SID is purely i/p as its name suggests :)
>> >>
>> >> I think a better approach would be
>> >>
>> >> 1) curl to upload/extract passing docID
>> >> 2) curl to update additional fields for that docID
>> >>
>> >>
>> >>
>> >> On 28 January 2015 at 17:30, Mark  wrote:
>> >>
>> >>>
>> >>> "Create the SID from the existing doc" implies that a document already
>> >>> exists that you wish to add fields to.
>> >>>
>> >>> However if the document is a binary are you suggesting
>> >>>
>> >>> 1) curl to upload/extract passing docID
>> >>> 2) obtain a SID based off docID
>> >>> 3) add addtinal fields to SID & commit
>> >>>
>> >>> I know I'm possibly wandering into the schemaless teritory here as
>> well
>> >>>
>> >>>
>> >>> On 28 January 2015 at 17:11, Andrew Pawloski 
>> wrote:
>> >>>
>>  I would switch the order of those. Add the new fields and *then*
>> index to
>>  solr.
>> 
>>  We do something similar when we create SolrInputDocuments that are
>> pushed
>>  to solr. Create the SID from the existing doc, add any additional
>> fields,
>>  then add to solr.
>> 
>>  On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
>> 
>>  > Is it possible to use curl to upload a document (for extract &
>>  indexing)
>>  > and specify some fields on the fly?
>>  >
>>  > sort of:
>>  > 1) index this document
>>  > 2) by the way here are some important facets whilst your at it
>>  >
>>  > Regards
>>  >
>>  > Mark
>>  >
>> 
>> >>>
>> >>>
>> >>
>>
>
>


Re: extract and add fields on the fly

2015-01-28 Thread Mark
Use case is

use curl to upload/extract/index document passing in additional facets not
present in the document e.g. literal.source="old system"

In this way some fields come from the uploaded extracted content and some
fields as specified in the curl URL

Hope that's clearer?

Regards

Mark


On 28 January 2015 at 17:54, Alexandre Rafalovitch 
wrote:

> Sounds like 'literal.X' syntax from
>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>
> Can you explain your use case as different from what's already
> documented? May be easier to understand.
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 28 January 2015 at 12:45, Mark  wrote:
> > I'm looking to
> >
> > 1) upload a binary document using curl
> > 2) add some additional facets
> >
> > Specifically my question is can this be achieved in 1 curl operation or
> > does it need 2?
> >
> > On 28 January 2015 at 17:43, Mark  wrote:
> >
> >>
> >> Second thoughts SID is purely i/p as its name suggests :)
> >>
> >> I think a better approach would be
> >>
> >> 1) curl to upload/extract passing docID
> >> 2) curl to update additional fields for that docID
> >>
> >>
> >>
> >> On 28 January 2015 at 17:30, Mark  wrote:
> >>
> >>>
> >>> "Create the SID from the existing doc" implies that a document already
> >>> exists that you wish to add fields to.
> >>>
> >>> However if the document is a binary are you suggesting
> >>>
> >>> 1) curl to upload/extract passing docID
> >>> 2) obtain a SID based off docID
> >>> 3) add addtinal fields to SID & commit
> >>>
> >>> I know I'm possibly wandering into the schemaless teritory here as well
> >>>
> >>>
> >>> On 28 January 2015 at 17:11, Andrew Pawloski 
> wrote:
> >>>
>  I would switch the order of those. Add the new fields and *then*
> index to
>  solr.
> 
>  We do something similar when we create SolrInputDocuments that are
> pushed
>  to solr. Create the SID from the existing doc, add any additional
> fields,
>  then add to solr.
> 
>  On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
> 
>  > Is it possible to use curl to upload a document (for extract &
>  indexing)
>  > and specify some fields on the fly?
>  >
>  > sort of:
>  > 1) index this document
>  > 2) by the way here are some important facets whilst your at it
>  >
>  > Regards
>  >
>  > Mark
>  >
> 
> >>>
> >>>
> >>
>


Re: extract and add fields on the fly

2015-01-28 Thread Alexandre Rafalovitch
Sounds like 'literal.X' syntax from
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Can you explain your use case as different from what's already
documented? May be easier to understand.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 28 January 2015 at 12:45, Mark  wrote:
> I'm looking to
>
> 1) upload a binary document using curl
> 2) add some additional facets
>
> Specifically my question is can this be achieved in 1 curl operation or
> does it need 2?
>
> On 28 January 2015 at 17:43, Mark  wrote:
>
>>
>> Second thoughts SID is purely i/p as its name suggests :)
>>
>> I think a better approach would be
>>
>> 1) curl to upload/extract passing docID
>> 2) curl to update additional fields for that docID
>>
>>
>>
>> On 28 January 2015 at 17:30, Mark  wrote:
>>
>>>
>>> "Create the SID from the existing doc" implies that a document already
>>> exists that you wish to add fields to.
>>>
>>> However if the document is a binary are you suggesting
>>>
>>> 1) curl to upload/extract passing docID
>>> 2) obtain a SID based off docID
>>> 3) add addtinal fields to SID & commit
>>>
>>> I know I'm possibly wandering into the schemaless teritory here as well
>>>
>>>
>>> On 28 January 2015 at 17:11, Andrew Pawloski  wrote:
>>>
 I would switch the order of those. Add the new fields and *then* index to
 solr.

 We do something similar when we create SolrInputDocuments that are pushed
 to solr. Create the SID from the existing doc, add any additional fields,
 then add to solr.

 On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:

 > Is it possible to use curl to upload a document (for extract &
 indexing)
 > and specify some fields on the fly?
 >
 > sort of:
 > 1) index this document
 > 2) by the way here are some important facets whilst your at it
 >
 > Regards
 >
 > Mark
 >

>>>
>>>
>>


Re: extract and add fields on the fly

2015-01-28 Thread Mark
I'm looking to

1) upload a binary document using curl
2) add some additional facets

Specifically my question is can this be achieved in 1 curl operation or
does it need 2?

On 28 January 2015 at 17:43, Mark  wrote:

>
> Second thoughts SID is purely i/p as its name suggests :)
>
> I think a better approach would be
>
> 1) curl to upload/extract passing docID
> 2) curl to update additional fields for that docID
>
>
>
> On 28 January 2015 at 17:30, Mark  wrote:
>
>>
>> "Create the SID from the existing doc" implies that a document already
>> exists that you wish to add fields to.
>>
>> However if the document is a binary are you suggesting
>>
>> 1) curl to upload/extract passing docID
>> 2) obtain a SID based off docID
>> 3) add addtinal fields to SID & commit
>>
>> I know I'm possibly wandering into the schemaless teritory here as well
>>
>>
>> On 28 January 2015 at 17:11, Andrew Pawloski  wrote:
>>
>>> I would switch the order of those. Add the new fields and *then* index to
>>> solr.
>>>
>>> We do something similar when we create SolrInputDocuments that are pushed
>>> to solr. Create the SID from the existing doc, add any additional fields,
>>> then add to solr.
>>>
>>> On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
>>>
>>> > Is it possible to use curl to upload a document (for extract &
>>> indexing)
>>> > and specify some fields on the fly?
>>> >
>>> > sort of:
>>> > 1) index this document
>>> > 2) by the way here are some important facets whilst your at it
>>> >
>>> > Regards
>>> >
>>> > Mark
>>> >
>>>
>>
>>
>


Re: extract and add fields on the fly

2015-01-28 Thread Mark
Second thoughts SID is purely i/p as its name suggests :)

I think a better approach would be

1) curl to upload/extract passing docID
2) curl to update additional fields for that docID



On 28 January 2015 at 17:30, Mark  wrote:

>
> "Create the SID from the existing doc" implies that a document already
> exists that you wish to add fields to.
>
> However if the document is a binary are you suggesting
>
> 1) curl to upload/extract passing docID
> 2) obtain a SID based off docID
> 3) add addtinal fields to SID & commit
>
> I know I'm possibly wandering into the schemaless teritory here as well
>
>
> On 28 January 2015 at 17:11, Andrew Pawloski  wrote:
>
>> I would switch the order of those. Add the new fields and *then* index to
>> solr.
>>
>> We do something similar when we create SolrInputDocuments that are pushed
>> to solr. Create the SID from the existing doc, add any additional fields,
>> then add to solr.
>>
>> On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
>>
>> > Is it possible to use curl to upload a document (for extract & indexing)
>> > and specify some fields on the fly?
>> >
>> > sort of:
>> > 1) index this document
>> > 2) by the way here are some important facets whilst your at it
>> >
>> > Regards
>> >
>> > Mark
>> >
>>
>
>


Re: extract and add fields on the fly

2015-01-28 Thread Andrew Pawloski
Sorry, I may have misunderstood:

Are you talking about adding additional fields at indexing time? (Here I
would add the fields first *then* send to solr.)

Are you talking about updating a field withing an existing document in a
solr index? (In that case I would direct you here [1].)

Am I still misunderstanding?

[1]
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

On Wed, Jan 28, 2015 at 12:30 PM, Mark  wrote:

> "Create the SID from the existing doc" implies that a document already
> exists that you wish to add fields to.
>
> However if the document is a binary are you suggesting
>
> 1) curl to upload/extract passing docID
> 2) obtain a SID based off docID
> 3) add addtinal fields to SID & commit
>
> I know I'm possibly wandering into the schemaless teritory here as well
>
>
> On 28 January 2015 at 17:11, Andrew Pawloski  wrote:
>
> > I would switch the order of those. Add the new fields and *then* index to
> > solr.
> >
> > We do something similar when we create SolrInputDocuments that are pushed
> > to solr. Create the SID from the existing doc, add any additional fields,
> > then add to solr.
> >
> > On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
> >
> > > Is it possible to use curl to upload a document (for extract &
> indexing)
> > > and specify some fields on the fly?
> > >
> > > sort of:
> > > 1) index this document
> > > 2) by the way here are some important facets whilst your at it
> > >
> > > Regards
> > >
> > > Mark
> > >
> >
>


Re: extract and add fields on the fly

2015-01-28 Thread Mark
"Create the SID from the existing doc" implies that a document already
exists that you wish to add fields to.

However if the document is a binary are you suggesting

1) curl to upload/extract passing docID
2) obtain a SID based off docID
3) add addtinal fields to SID & commit

I know I'm possibly wandering into the schemaless teritory here as well


On 28 January 2015 at 17:11, Andrew Pawloski  wrote:

> I would switch the order of those. Add the new fields and *then* index to
> solr.
>
> We do something similar when we create SolrInputDocuments that are pushed
> to solr. Create the SID from the existing doc, add any additional fields,
> then add to solr.
>
> On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:
>
> > Is it possible to use curl to upload a document (for extract & indexing)
> > and specify some fields on the fly?
> >
> > sort of:
> > 1) index this document
> > 2) by the way here are some important facets whilst your at it
> >
> > Regards
> >
> > Mark
> >
>


Re: replicas goes in recovery mode right after update

2015-01-28 Thread Vijay Sekhri
Hi Shawn,
Thank you so much for the assistance. Building is not a problem . Back in
the days I have worked with linking, compiling and  building C , C++
software . Java is a piece of cake.
We have built the new war from the source version 4.10.3 and our
preliminary tests have shown that our issue (replicas in recovery on high
load)* is resolved *. We will continue to do more testing and confirm .
Please note that the *patch is BUGGY*.

It removed the break statement within while loop because of which, whenever
we send a list of docs it would hang (API CloudSolrServer.add) , but it
would work if send one doc at a time.

It took a while to figure out why that is happening. Once we put the break
statement back it worked like a charm.
Furthermore the patch has
solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java
which should be
solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java

Finally checking if(!offer) is sufficient than using if(offer == false)
Last but not the least having a configurable queue size and timeouts
(managed via solrconfig) would be quite helpful
Thank you once again for your help.

Vijay

On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey  wrote:

> On 1/27/2015 2:52 PM, Vijay Sekhri wrote:
> > Hi Shawn,
> > Here is some update. We found the main issue
> > We have configured our cluster to run under jetty and when we tried full
> > indexing, we did not see the original Invalid Chunk error. However the
> > replicas still went into recovery
> > All this time we been trying to look into replicas logs to diagnose the
> > issue. The problem seem to be at the leader side. When we looked into
> > leader logs, we found the following on all the leaders
> >
> > 3439873 [qtp1314570047-92] WARN
> >  org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
> > sending update
> > *java.lang.IllegalStateException: Queue full*
>
> 
>
> > There is a similar bug reported around this
> > https://issues.apache.org/jira/browse/SOLR-5850
> >
> > and it seem to be in OPEN status. Is there a way we can configure the
> queue
> > size and increase it ? or is there a version of solr that has this issue
> > resolved already?
> > Can you suggest where we go from here to resolve this ? We can repatch
> the
> > war file if that is what you would recommend .
> > In the end our initial speculation about solr unable to handle so many
> > update is correct. We do not see this issue when the update load is less.
>
> Are you in a position where you can try the patch attached to
> SOLR-5850?  You would need to get the source code for the version you're
> on (or perhaps a newer 4.x version), patch it, and build Solr yourself.
> If you have no experience building java packages from source, this might
> prove to be difficult.
>
> Thanks,
> Shawn
>
>


-- 
*
Vijay Sekhri
*


Re: extract and add fields on the fly

2015-01-28 Thread Andrew Pawloski
I would switch the order of those. Add the new fields and *then* index to
solr.

We do something similar when we create SolrInputDocuments that are pushed
to solr. Create the SID from the existing doc, add any additional fields,
then add to solr.

On Wed, Jan 28, 2015 at 11:56 AM, Mark  wrote:

> Is it possible to use curl to upload a document (for extract & indexing)
> and specify some fields on the fly?
>
> sort of:
> 1) index this document
> 2) by the way here are some important facets whilst your at it
>
> Regards
>
> Mark
>


Re: How to implement Auto complete, suggestion client side

2015-01-28 Thread Olivier Austina
Hi,

Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me.

Regards
Olivier


2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch :

> You've got a lot of options depending on what you want. But since you
> seem to just want _an_ example, you can use mine from
> http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
> box there).
>
> You can see the source for the test screen (using Spring Boot and
> Spring Data Solr as a middle-layer) and Select2 for the UI at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
> The Solr definition is at:
>
> https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf
>
> Other implementation pieces are in that (and another) public
> repository as well, but it's all in Java. You'll probably want to do
> something similar in PHP.
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 26 January 2015 at 17:11, Olivier Austina 
> wrote:
> > Hi All,
> >
> > I would say I am new to web technology.
> >
> > I would like to implement auto complete/suggestion in the user search box
> > as the user type in the search box (like Google for example). I am using
> > Solr as database. Basically I am  familiar with Solr and I can formulate
> > suggestion queries.
> >
> > But now I don't know how to implement suggestion in the User Interface.
> > Which technologies should I need. The website is in PHP. Any suggestions,
> > examples, basic tutorial is welcome. Thank you.
> >
> >
> >
> > Regards
> > Olivier
>


extract and add fields on the fly

2015-01-28 Thread Mark
Is it possible to use curl to upload a document (for extract & indexing)
and specify some fields on the fly?

sort of:
1) index this document
2) by the way here are some important facets whilst your at it

Regards

Mark


Re: PostingsHighlighter highlighted snippet size (fragsize)

2015-01-28 Thread Zisis Tachtsidis
It seems that a solution has been found.

PostingsHighlighter uses by default Java's SENTENCE BreakIterator so it
breaks the snippets into fragments per sentence.
In my text_en analysis chain though I was using a filter that lowercases
input and this seems to mess with the logic of SENTENCE BreakIterator.
Removing the filter did the trick.

Apart from that there is a new issue now. I'm trying to search on one field
and highlight another and this seems to not be working even If I use the
exact same analyzers for both fields. I get the correct results in the
highlighting section but there is no highlight. Digging deeper I've found
inside PostingsHighlighter.highlightFieldsAsObjects() (line 393 in version
4.10.3) that the fields to be highlighted (I guess) are the intersection of
the query terms set (fields used in the search query) and the set of fields
to be highlighted (defined by the hl.fl param). So, unless I use the field
to be highlighted in the search query I get no highlight.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PostingsHighlighter-highlighted-snippet-size-fragsize-tp4180634p4182596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: replica never takes leader role

2015-01-28 Thread Erick Erickson
This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital  wrote:

> We're using Solr 4.8.0
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, January 27, 2015 7:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replica never takes leader role
>
> What version of Solr? This is an ongoing area of improvements and several
> are very recent.
>
> Try searching the JIRA for Solr for details.
>
> Best,
> Erick
>
> On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital 
> wrote:
>
> > Hello,
> >
> > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
> > zookeeper instances. We have noticed that when a leader node goes down
> the
> > replica never takes over as a leader, cloud becomes unusable and we have
> to
> > bounce entire cloud for replica to assume leader role. Is this default
> > behavior? How can we change this?
> >
> > Thanks.
> >
> >
> >
>


Re: CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Shawn Heisey
On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote:
> My problem:
> I create cores dynamically using container#create( CoreDescriptor ) and then 
> add documents to the very core(s). So far so good.
> When I restart my app I do
> container = CoreContainer#createAndLoad(...)
> but when I then call container.getAllCoreNames() an empty list is returned.
>
> What cores should be loaded by the container if I call
> CoreContainer#createAndLoad(...)
> ? Where does the container lookup the "existing cores"?

If the solr.xml is the old format, then cores are defined in solr.xml,
in the  section of that config file.

There is a new format for solr.xml that is supported in version 4.4 and
later and will be mandatory in 5.0.  If that format is present, then
Solr will use core discovery -- starting from either the solr home or a
defined coreRootDirectory, solr will search for core.properties files
and treat each directory where one is found as a core's instanceDir.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Re: Solrcloud open new searcher not happening in slave for deletebyID

2015-01-28 Thread Shawn Heisey
On 1/27/2015 5:50 PM, vsriram30 wrote:
> I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record
> to solr, then I see the following commit update command in both master and
> in slave node :

One of the first things to find out is whether it's still a problem in
the latest version of Solr, which is currently 4.10.3.  Solr 4.6.1 is a
year old, and there have been seven new versions released since then. 
Solr, especially SolrCloud, changes at a VERY rapid pace ... in each
version, many bugs are fixed, and each x.y.0 version adds new
features/functionality.

I'm not in a position to set up a minimal SolrCloud testbed to try this
out, or I would try it myself.

> 2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
>
> I am also setting the updateRequest.setCommitWithin(5000);
>
> Here as noticed, the openSearcher=true and hence after 5 seconds, I am able
> to see the record in index in both slave and in master.
>
> Now if I trigger another UpdateRequest with only deleteById set and no add
> documents to Solr, with the same commit within time, then 
>
> in the master log I see,
>
> 2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
>
> and in the slave log I see,
> 2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start
> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
> Here as noticed, the master is having openSearcher=true and slave is having
> openSearcher=false. This causes inconsistency in the results as master shows
> that the record is deleted and slave still has the record.
>
> After digging through the code a bit, I think this is probably happening in
> CommitTracker where the openSearcher might be false while creating the
> CommitUpdateCommand.
>
> Can you advise if there is any ticket created to address this issue or can I
> create one? Also is there any workaround for this till the bug is fixed than
> to set commit within duration in server to a lower value?

It does sound like a bug.  Some possible workarounds, no idea how
effective they will be:

*) Try deleteByQuery to see whether it is affected the same way.
*) Use autoSoftCommit in solrconfig.xml instead of commitWithin on the
update request.

I do see a report of an identical problem on this mailing list, two days
after 4.0-ALPHA was announced, which was the first public release that
included SolrCloud.  Both of the following URLs open the same message:

http://osdir.com/ml/solr-user.lucene.apache.org/2012-07/msg00214.html
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3ccal3vrcdsiqyajuy6eqvpak0ftg-oy7n5g7cql4x4_8sz5jm...@mail.gmail.com%3E

I did not find an existing issue in Jira for this problem, so if the
same problem exists in 4.10.3, filing one sounds like a good idea.

Thanks,
Shawn



CoreContainer#createAndLoad, existing cores not loaded

2015-01-28 Thread Clemens Wyss DEV
My problem:
I create cores dynamically using container#create( CoreDescriptor ) and then 
add documents to the very core(s). So far so good.
When I restart my app I do
container = CoreContainer#createAndLoad(...)
but when I then call container.getAllCoreNames() an empty list is returned.

What cores should be loaded by the container if I call
CoreContainer#createAndLoad(...)
? Where does the container lookup the "existing cores"?


Re: [MASSMAIL]Re: "Contextual" sponsored results with Solr

2015-01-28 Thread Jorge Luis Betancourt González
We are trying to avoid firing 2 queries per request. I've started to play with 
a PostFilter to see how it goes, perhaps something in the line of the 
ReRankQueryQueryParser could be used to avoid using two queries and instead 
rerank the results? 

- Original Message -
From: "Ahmet Arslan" 
To: solr-user@lucene.apache.org
Sent: Tuesday, January 27, 2015 11:06:29 PM
Subject: [MASSMAIL]Re: "Contextual" sponsored results with Solr

Hi Jorge,

We have done similar thing with N=3. We issue separate two queries/requests, 
display 'special N' above the results.
We excluded 'special N' with -id:(1 2 3 ... N) type query. all done on client 
side.

Ahmet



On Tuesday, January 27, 2015 8:28 PM, Jorge Luis Betancourt González 
 wrote:
Hi all,

Recently I got an interesting use case that I'm not sure how to implement, the 
idea is that the client wants a fixed number of documents, let's call it N, to 
appear in the top of the results. Let me explain a little we're working with 
web documents so the idea is too promote the documents that match the query of 
the user from a given domain (wikipedia, for example) to the top of the list. 
So if I apply a a boost using the boost parameter:

http://localhost:8983/solr/select?q=search&fl=url&boost=map(query($type1query),0,0,1,50)&type1query=host:wikipedia

I get *all* the documents from the desired host at the top, but there is no way 
of limiting the number of documents from the host that are boosted to the top 
of the result list (which could lead to several pages of content from the same 
host, which is not desired, the idea is to only show N) . I was thinking in 
something like field collapsing/grouping but only for the documents that match 
my $type1query parameter (host:wikipedia) but I don't see any way of doing 
grouping/collapsing on only one group and leave the other results untouched. 

I although thought on using 2 groups using group.query=host:wikipedia and 
group.query=-host:wikipedia, but in this case there is no way of controlling 
how much documents each independently group will have.

In this particular case QueryElevationComponent it's not helping because I 
don't want to map all the posible queries I just want to put the some of the 
results from a certain host in the top of the list, but without boosting all 
the documents from the same host.

Any thoughts or recommendations on this? 

Thank you,

Regards,


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.



RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread Dyer, James
Try using something larger than 2 for alternativeTermCount.  5 is probably ok 
here.  If that doesn't work, then post the exact query you are using and the 
full extended spellcheck results.

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I have this in my solrconfig:




explicit
10
catch_all

on
default
wordbreak
false
5
2
100
true
true
5
3



spellcheck




Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others'
docs).
I want to suggest "gopro" for "go pro" search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
We're using Solr 4.8.0


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, January 27, 2015 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

What version of Solr? This is an ongoing area of improvements and several
are very recent.

Try searching the JIRA for Solr for details.

Best,
Erick

On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital  wrote:

> Hello,
>
> We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
> zookeeper instances. We have noticed that when a leader node goes down the
> replica never takes over as a leader, cloud becomes unusable and we have to
> bounce entire cloud for replica to assume leader role. Is this default
> behavior? How can we change this?
>
> Thanks.
>
>
>


Re: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory

2015-01-28 Thread Nitin Solanki
Ok.. I got the solution.
Changed the value of maxQueryFrequency from 0.01(1%) to 0.9(90%). It is
working. thanks a lot.

On Tue, Jan 27, 2015 at 8:55 PM, Dyer, James 
wrote:

> Can you give a little more information as to how you have the spellchecker
> configured in solrsonfig.xml?  Also, it would help if you showed a query
> and the spell check response and then explain what you wanted it to return
> vs what it actually returned.
>
> My guess is that the stop words you mention exist in your spelling index
> and you're not using the "alternativeTermCount" parameter, which tells it
> to suggest for terms that exist in the index.
>
> I take it also you're using shingles to get word-break suggestions?  You
> might have better luck with this using WordBreakSolrSpellchecker instead of
> shingles.
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, January 27, 2015 5:06 AM
> To: solr-user@lucene.apache.org
> Subject: Stop word suggestions are coming when I indexed sentence using
> ShingleFilterFactory
>
> Hi,
>   I am getting the suggestion of both correct words and misspell
> words but not getting, stop words suggestions. Why? Even I am not using
> solr.StopFilterFactory.
>
>
> Schema.xml :
>
> * required="true" multiValued="false"/>*
>
>  positionIncrementGap="100">
>
> 
>
>  minShingleSize="2" outputUnigrams="true"/>
>
>  
>  
> 
>
>  minShingleSize="2" outputUnigrams="true"/>
>
> 
> 
>


What is the best way to update an index?

2015-01-28 Thread Carl Roberts

Hi,

What is the best way to update an index with new data or records? Via 
this command:


curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&synchronous=true&entity=cve-2002";


or this command:

curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=delta-import&synchronous=true&entity=cve-2002";



Thanks,

Joe


Re: Solr facet search improvements

2015-01-28 Thread Jack Krupansky
It would probably be better to do entity extraction and normalization of
job titles as a front-end process before ingesting the data into Solr, but
you could also do it as a custom or script update processor. The latter can
be easily coded in JavaScript to run within Solr

Your first step in any case will be to define the specific rules you wish
to use for both normalization of job titles and the actual matching. Yes,
you can do that in Solr, but you have to do it, Solr will not do it
magically for you. Also, post some specific query examples that completely
cover the range of queries you need to be able to handle.

-- Jack Krupansky

On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush 
wrote:

> I have around 1 million job titles which are indexed on Solr and am looking
> to improve the faceted search results on job title matches.
>
> For example: a job search for *Research Scientist Computer Architecture* is
> made, and the facet field title which is tokenized in solr and gives the
> following results:
>
> 1. Senior Data Scientist
> 2. PARALLEL COMPUTING SOFTWARE ENGINEER
> 3. Engineer/Scientist 4
> 4. Data Scientist
> 5. Engineer/Scientist
> 6. Senior Research Scientist
> 7. Research Scientist-Wireless Networks
> 8. Research Scientist-Andriod Development
> 9. Quantum Computing Theorist Job
> 10.Data Sceintist Smart Analytics
>
> I want to be able to improve / optimize the job titles and be able to make
> exclusions and some normalizations. Is this possible with Solr? What is the
> best way to have more granular control over the facted search results ?
>
> For example *Engineer/Scientist 4* - is not useful and too specific and
> titles like *Quantum Computing theorist* would ideally also be excluded
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Morphology of synonims

2015-01-28 Thread Shawn Heisey
On 1/28/2015 5:11 AM, Reinforcer wrote:
> Is Solr capable of using morphology for synonims?
> 
> For example. Request: "inanely".
> Indexed text in Solr: "Searching keywords without morphology is fatuously".
> "inane" and "fatuous" are synonims.
> 
> So, "inanely" ---morphology> "inane" -synonims---> "fatuous"
> ---morphology> "fatuously". Is this possible ("double morphology")?

Synonyms are handled via exact match.  The feature you are describing is
called stemming or lemmatization.

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

It is possible to combine stemming and synonyms in the same analysis
chain, but you must figure out what the root word is to put into your
synonym list.  It may not be what you expect.  For example, the english
stemmer will probably change "acheive" to "acheiv" ... which sounds
wrong, until you remember that stemming must be applied both at index
and query time, and the user will never see that form of the word.

Synonyms are usually only applied at either index or query time.  Which
one to choose depends on your requirements, but I believe it is
typically on the query side.

The analysis tab in the admin UI is invaluable for seeing the results of
changes in the analysis chain.

Thanks,
Shawn



Re: Running multiple full-import commands via curl in a script

2015-01-28 Thread Carl Roberts

Thanks Mikhail - synchronous=true works like a charm...:)

On 1/28/15, 5:16 AM, Mikhail Khludnev wrote:

Literally, queue can be done by submitting as is (async) and polling
command status. However, giving
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200
you can try to add &synchronous=true&... that should hang request until
it's completed.
The other question is how run requests in parallel which is explicitly
violated by
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173
  The only workaround I can suggest is to duplicate DIH definitions in solr
config
...
...
...
  ...
then those guys should be able to handle own request in parallel. Nasty
stuff..
have a good hack

On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts 
wrote:
Hi,

I am attempting to run all these curl commands from a script so that I can
put them in a crontab job, however, it seems that only the first one
executes and the other ones return with an error (below):

curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2002"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2003"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2004"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2005"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2006"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2007"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2008"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2009"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2010"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2011"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2012"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2013"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2014"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
full-import&clean=false&entity=cve-2015"
curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
delta-import&clean=false&entity=cve-last"

error:

*A command is still running...*

Question:  Is there a way to queue the other requests in Solr so that they
run as soon as the previous one is done?  If not, how would you recommend I
do this?

Many thanks in advance,

Joe









Re: Solr facet search improvements

2015-01-28 Thread Shawn Heisey
On 1/28/2015 3:56 AM, thakkar.aayush wrote:
> I have around 1 million job titles which are indexed on Solr and am looking
> to improve the faceted search results on job title matches.
> 
> For example: a job search for *Research Scientist Computer Architecture* is
> made, and the facet field title which is tokenized in solr and gives the
> following results:
> 
> 1. Senior Data Scientist 
> 2. PARALLEL COMPUTING SOFTWARE ENGINEER 
> 3. Engineer/Scientist 4 
> 4. Data Scientist 
> 5. Engineer/Scientist 
> 6. Senior Research Scientist 
> 7. Research Scientist-Wireless Networks 
> 8. Research Scientist-Andriod Development 
> 9. Quantum Computing Theorist Job 
> 10.Data Sceintist Smart Analytics
> 
> I want to be able to improve / optimize the job titles and be able to make
> exclusions and some normalizations. Is this possible with Solr? What is the
> best way to have more granular control over the facted search results ?
> 
> For example *Engineer/Scientist 4* - is not useful and too specific and
> titles like *Quantum Computing theorist* would ideally also be excluded

Normally, if the field is tokenized, you will not get the original
values in the facet.  You will get values like "senior" instead of
"Senior Data Scientist".  If DocValues are enabled on the field, then
you may well indeed get the original values.  I've never tried facets on
a tokenized field with DocValues, but everything I understand about the
feature says it would result in the original (not tokenized) values.

If you want different values in the facets, then you'll need to change
those values before they get indexed in Solr.  That can be done with
custom UpdateProcessor code embedded in the update chain, or you can
simply do the changes in your program that indexes the data in Solr.

Thanks,
Shawn



Re: Reindex data without creating new index.

2015-01-28 Thread Shawn Heisey
On 1/27/2015 11:54 PM, SolrUser1543 wrote:
> I want to reindex my data in order to change a value of some field according
> to value of another. ( both field are existing ) 
> 
> For this purpose I run a "clue" utility in order to get a list of IDs.  
> Then I created an update processor , which can set a value of field A
> according to value of field B.
> I added a new request handler ,like a classic update , but with new update
> chain with a new update processor
> 
> I want to run a http post request for each ID , to a new handler ,with item
> id only. 
> This will trigger my update processor , which will get an existing doc from
> the index and do the logic. 
> 
> So in this way I can do some enrichment , without full data import and
> without creating a new index .
> 
> What do you think about it ?
> Could it cause a performance degradation because of it? SOLR can handle it
> or it will rebalance the index ?
> Does SOLR has some built in feature which can do it ?

This is likely possible, with some caveats.  You'll need to write all
the code yourself, extending the UpdateRequestProcessorFactory and
UpdateRequestProcessor classes.

This will be similar to the atomic update feature, so you'll likely need
to find that source code and model yours on its operation.  It will have
the same requirements -- all fields must be 'stored="true"' except those
which are copyField destinations, which must be 'stored="false"'.  With
Atomic Updates, this requirement is not *enforced*, but it must be met,
or there will be data loss.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

What do you mean by "rebalance" the index?  This could mean almost
anything, but most of the meanings I can come up with would not apply to
this situation at all.

The effect on Solr for each document you process will be the sum of:  A
query for that document, a tiny bit for the update processor itself,
followed by a reindex of that document.

Thanks,
Shawn



Morphology of synonims

2015-01-28 Thread Reinforcer
Hi,

Is Solr capable of using morphology for synonims?

For example. Request: "inanely".
Indexed text in Solr: "Searching keywords without morphology is fatuously".
"inane" and "fatuous" are synonims.

So, "inanely" ---morphology> "inane" -synonims---> "fatuous"
---morphology> "fatuously". Is this possible ("double morphology")?


Best regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Morphology-of-synonims-tp4182517.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr facet search improvements

2015-01-28 Thread thakkar.aayush
I have around 1 million job titles which are indexed on Solr and am looking
to improve the faceted search results on job title matches.

For example: a job search for *Research Scientist Computer Architecture* is
made, and the facet field title which is tokenized in solr and gives the
following results:

1. Senior Data Scientist 
2. PARALLEL COMPUTING SOFTWARE ENGINEER 
3. Engineer/Scientist 4 
4. Data Scientist 
5. Engineer/Scientist 
6. Senior Research Scientist 
7. Research Scientist-Wireless Networks 
8. Research Scientist-Andriod Development 
9. Quantum Computing Theorist Job 
10.Data Sceintist Smart Analytics

I want to be able to improve / optimize the job titles and be able to make
exclusions and some normalizations. Is this possible with Solr? What is the
best way to have more granular control over the facted search results ?

For example *Engineer/Scientist 4* - is not useful and too specific and
titles like *Quantum Computing theorist* would ideally also be excluded



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Running multiple full-import commands via curl in a script

2015-01-28 Thread Mikhail Khludnev
Literally, queue can be done by submitting as is (async) and polling
command status. However, giving
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200
you can try to add &synchronous=true&... that should hang request until
it's completed.
The other question is how run requests in parallel which is explicitly
violated by
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173
 The only workaround I can suggest is to duplicate DIH definitions in solr
config
   ...
   ...
   ...
 ...
then those guys should be able to handle own request in parallel. Nasty
stuff..
have a good hack

On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts  wrote:

> Hi,
>
> I am attempting to run all these curl commands from a script so that I can
> put them in a crontab job, however, it seems that only the first one
> executes and the other ones return with an error (below):
>
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2002"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2003"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2004"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2005"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2006"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2007"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2008"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2009"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2010"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2011"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2012"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2013"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2014"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> full-import&clean=false&entity=cve-2015"
> curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=
> delta-import&clean=false&entity=cve-last"
>
> error:
>
> *A command is still running...*
>
> Question:  Is there a way to queue the other requests in Solr so that they
> run as soon as the previous one is done?  If not, how would you recommend I
> do this?
>
> Many thanks in advance,
>
> Joe
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Reading data from another solr core

2015-01-28 Thread Alvaro Cabrerizo
Hi,

I usually use the SolrEntityProcessor for moving/transform data between
cores, it's  a piece of cake!

Regards.

On Wed, Jan 28, 2015 at 8:13 AM, solrk  wrote:

> Hi Guys,
>
> I have multiple cores setup in my solr server. I would like read/import
> data
> from one core(source) into another core(target) and index it..Is there is a
> easy way in solr to do so?
>
> I was thinking of using SolrEntityProcessor for this purpose..any other
> suggestions is appreciated..
>
> http://blog.trifork.com/2011/11/08/importing-data-from-another-solr/
>
> For example:
>
> 
>   
>  url=""
>   processor="XPathEntityProcessor">
>
>  
>   url="http://127.0.0.1:8081/solr/core2";>
> 
>
>
> 
>
> Please sugget me if there is better solution? or Should i write new
> processor which reads the index of another core?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>