Re: choosing placement upon RESTORE

2017-05-02 Thread xavier jmlucjav
thanks Mikhail, that sounds like it would help me as it allows you to set
createNodeSet on RESTORE calls

On Tue, May 2, 2017 at 2:50 PM, Mikhail Khludnev  wrote:

> This sounds relevant, but different to https://issues.apache.org/
> jira/browse/SOLR-9527
> You may want to follow this ticket.
>
> On Mon, May 1, 2017 at 9:15 PM, xavier jmlucjav 
> wrote:
>
>> hi,
>>
>> I am facing this situation:
>> - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's
>> just
>> for dev work)
>> - the collections where created with:
>>action=CREATE&...&createNodeSet=EMPTY"
>> then
>>   action=ADDREPLICA&...&node=$NODEA&dataDir=$DATADIR"
>> - I have taken a BACKUP of the collections
>> - Solr is upgraded to 6.5.1
>>
>> Now, I started using RESTORE to restore the collections on the node A
>> (where they lived before), but, instead of all being created in node A,
>> collections have been created in A, then B, then C nodes. Well, Solrcloud
>> tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
>> disk, not reachable from nodes B and C.
>>
>> How is this supposed to work? I am looking at Rule Based Placement but it
>> seems it is only available for CREATESHARD, so I can use it in RESTORE?
>> Isn't there a way to force Solrcloud to create the collection in a given
>> node?
>>
>> thanks!
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


choosing placement upon RESTORE

2017-05-01 Thread xavier jmlucjav
hi,

I am facing this situation:
- I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's just
for dev work)
- the collections where created with:
   action=CREATE&...&createNodeSet=EMPTY"
then
  action=ADDREPLICA&...&node=$NODEA&dataDir=$DATADIR"
- I have taken a BACKUP of the collections
- Solr is upgraded to 6.5.1

Now, I started using RESTORE to restore the collections on the node A
(where they lived before), but, instead of all being created in node A,
collections have been created in A, then B, then C nodes. Well, Solrcloud
tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
disk, not reachable from nodes B and C.

How is this supposed to work? I am looking at Rule Based Placement but it
seems it is only available for CREATESHARD, so I can use it in RESTORE?
Isn't there a way to force Solrcloud to create the collection in a given
node?

thanks!


DIH: last_index_time not updated on if 0 docs updated

2017-02-27 Thread xavier jmlucjav
Hi,

After getting our interval for calling delta index shorter and shorter, I
have found out that last_index_time  in dataimport.properties is not
updated every time the indexing runs, it is skipped if no docs where added.

This happens at least in the following scenario:
- running delta as full index
( /dataimport?command=full-import&clean=false&commit=true )
- Solrcloud setup, so dataimport.properties is in zookeeper
- Solr 5.5.0

I understand skipping the commit on the index if no docs were updated is a
nice optimization, but I believe the last_index_time info should be updated
in all cases, so it reflects reality. We, for instance, are looking at this
piece of information in order to do other stuff.

I could not find any mention of this on Jira, so I wonder if this is
intented or just nobody had an issue with it?

xavier


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
hi Shawn,

as I replied to Markus, of course I know (and use) the collections api to
reload the config. I am asking what would happen in that scenario:
 - config updated (but collection not reloaded)
 - i restart one node
now one node has the new config and the rest the old one??

To which he already replied:
>The restared/reloaded node has the new config, the others have the old
config until reloaded/restarted.

I was not asking about making solr restart itself, my English must be worst
than I thought. By the way, stuff like that can be achieved with
http://yajsw.sourceforge.net/ a very powerful java wrapper, I used to use
it when Solr did not have a built in daemon setup. It was built by someone
how was using JSW, and got pissed when that one went commercial. It is very
configurable, but of course more complex. I wrote something about it some
time ago
https://medium.com/@jmlucjav/how-to-install-solr-as-a-service-in-any-platform-including-solr-5-8e4a93cc3909

thanks

On Thu, Feb 9, 2017 at 4:53 PM, Shawn Heisey  wrote:

> On 2/9/2017 5:24 AM, xavier jmlucjav wrote:
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the rest alone. If all the nodes contain a shard for C, what happens, N1 is
> using the new config and the rest are not? how is this handled?
>
> If you want to change the config or schema for a collection and make it
> active across all nodes, just use the collections API to RELOAD the
> collection.  The change will be picked up everywhere.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> To answer your question: No.  Solr does not have the ability to restart
> itself.  It would require significant development effort and a
> fundamental change in how Solr is started to make it possible.  It is
> something that has been discussed, but at this time it is not possible.
>
> One idea that would make this possible is mentioned on the following
> wiki page.  It talks about turning Solr into two applications instead of
> one:
>
> https://wiki.apache.org/solr/WhyNoWar#Information_that.27s_
> not_version_specific
>
> Again -- it would not be easy, which is why it hasn't been done yet.
>
> Thanks,
> Shawn
>
>


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi Markus,

yes, of course I know (and use) the collections api to reload the config. I
am asking what would happen in that scenario:
- config updated (but collection not reloaded)
- i restart one node

now one node has the new config and the rest the old one??

Regarding restarting many hosts, my question is if we can just 'restart'
each solr and that is enough, or it is better to first stop all, and then
start all.

thanks


On Thu, Feb 9, 2017 at 1:28 PM, Markus Jelsma 
wrote:

> Hello - if you just want to use updated configuration, you can use Solr's
> collection reload API call. For restarting we rely on remote provisioning
> tools such as Salt, other managing tools can probably execute commands
> remotely as well.
>
> If you operate more than just a very few machines, i'd really recommend
> using these tools.
>
> Markus
>
>
>
> -Original message-
> > From:xavier jmlucjav 
> > Sent: Thursday 9th February 2017 13:24
> > To: solr-user 
> > Subject: procedure to restart solrcloud, and config/collection
> consistency
> >
> > Hi,
> >
> > When I need to restart a Solrcloud cluster, I always do this:
> > - log in into host nb1, stop solr
> > - log in into host nb2, stop solr
> > -...
> > - log in into host nbX, stop solr
> > - verify all hosts did stop
> > - in host nb1, start solr
> > - in host nb12, start solr
> > -...
> >
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the
> > rest alone. If all the nodes contain a shard for C, what happens, N1 is
> > using the new config and the rest are not? how is this handled?
> >
> > thanks
> > xavier
> >
>


procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi,

When I need to restart a Solrcloud cluster, I always do this:
- log in into host nb1, stop solr
- log in into host nb2, stop solr
-...
- log in into host nbX, stop solr
- verify all hosts did stop
- in host nb1, start solr
- in host nb12, start solr
-...

I always wondered, if this was not really needed, and I could just call
'restart' in every node, in a quick loop, and forget about it. Does anyone
know if this is the case?

My doubt is in regards to changing some config, and then doing the above
(just restart nodes in a loop). For example, what if I change a config G
used in collection C, and I restart just one of the nodes (N1), leaving the
rest alone. If all the nodes contain a shard for C, what happens, N1 is
using the new config and the rest are not? how is this handled?

thanks
xavier


reuse a org.apache.lucene.search.Query in Solrj?

2017-01-05 Thread xavier jmlucjav
Hi,

I have a lucene Query (Boolean query with a bunch of possibly complex
spatial queries, even polygon etc) that I am building for some MemoryIndex
stuff.

Now I need to add that same query to a Solr query (adding it to a bunch of
other fq I am using). Is there a some way to piggyback the lucene query
this way?? It would be extremelly handy in my situation.

thanks
xavier


solrj: get to which shard a id will be routed

2016-12-22 Thread xavier jmlucjav
Hi

Is there somewhere a sample of some solrj code that given:
- a collection
- the id (like "IBM!12345")

returns the shard to where the doc will be routed? I was hoping to get that
info from CloudSolrClient  itself but it's not exposing it as far as I can
see.

thanks
xavier


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
done, with simple patch https://issues.apache.org/jira/browse/SOLR-9697

On Thu, Oct 27, 2016 at 4:21 PM, xavier jmlucjav  wrote:

> sure, will do, I tried before but I could not create a Jira, now I can,
> not sure what was happening.
>
> On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
>> out soon and we'd have to hurry if this fix has to go in.
>>
>> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav 
>> wrote:
>>
>> > Correcting myself here, I was wrong about the cause (I had already
>> messed
>> > with the script).
>> >
>> > I made it work by commenting out line 1261 (the number might be a bit
>> off
>> > as I have modified the script, but hopefully its easy to see where):
>> >
>> > ) ELSE IF "%1"=="/?" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-h" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-help" (
>> >   goto zk_usage
>> > ) ELSE IF "!ZK_SRC!"=="" (
>> >   if not "%~1"=="" (
>> > goto set_zk_src
>> >   )
>> >  * rem goto zk_usage*
>> > ) ELSE IF "!ZK_DST!"=="" (
>> >   IF "%ZK_OP%"=="cp" (
>> > goto set_zk_dst
>> >   )
>> >   IF "%ZK_OP%"=="mv" (
>> > goto set_zk_dst
>> >   )
>> >   set ZK_DST="_"
>> > ) ELSE IF NOT "%1"=="" (
>> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
>> >
>> > Now upconfig works!
>> >
>> > thanks
>> > xavier
>> >
>> >
>> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav 
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Am I missing something or this is broken in windows? I cannot
>> upconfig,
>> > > the scripts keeps exiting immediately and showing usage, as if I use
>> some
>> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
>> it
>> > > also on win7 (don't have that around anymore to try)
>> > >
>> > > I think the issue is: there is a SHIFT too much in line 1276 of
>> solr.cmd:
>> > >
>> > > :set_zk_op
>> > > set ZK_OP=%~1
>> > > SHIFT
>> > > goto parse_zk_args
>> > >
>> > > if this SHIFT is removed, then parse_zk_args works (and it does the
>> shift
>> > > itself). But the upconfig hangs, so still it does not work.
>> > >
>> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
>> 4dc7576d8e
>> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
>> > issue.
>> > > Would be great if this was fixed in the incoming 6.3...
>> > >
>> > > My cmd scripting is not too strong and I did not go further. I
>> searched
>> > > Jira but found nothing. By the way is it not possible to open tickets
>> in
>> > > Jira anymore?
>> > >
>> > > xavier
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
sure, will do, I tried before but I could not create a Jira, now I can, not
sure what was happening.

On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
> out soon and we'd have to hurry if this fix has to go in.
>
> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav 
> wrote:
>
> > Correcting myself here, I was wrong about the cause (I had already messed
> > with the script).
> >
> > I made it work by commenting out line 1261 (the number might be a bit off
> > as I have modified the script, but hopefully its easy to see where):
> >
> > ) ELSE IF "%1"=="/?" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-h" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-help" (
> >   goto zk_usage
> > ) ELSE IF "!ZK_SRC!"=="" (
> >   if not "%~1"=="" (
> > goto set_zk_src
> >   )
> >  * rem goto zk_usage*
> > ) ELSE IF "!ZK_DST!"=="" (
> >   IF "%ZK_OP%"=="cp" (
> > goto set_zk_dst
> >   )
> >   IF "%ZK_OP%"=="mv" (
> > goto set_zk_dst
> >   )
> >   set ZK_DST="_"
> > ) ELSE IF NOT "%1"=="" (
> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
> >
> > Now upconfig works!
> >
> > thanks
> > xavier
> >
> >
> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav 
> > wrote:
> >
> > > hi,
> > >
> > > Am I missing something or this is broken in windows? I cannot upconfig,
> > > the scripts keeps exiting immediately and showing usage, as if I use
> some
> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
> it
> > > also on win7 (don't have that around anymore to try)
> > >
> > > I think the issue is: there is a SHIFT too much in line 1276 of
> solr.cmd:
> > >
> > > :set_zk_op
> > > set ZK_OP=%~1
> > > SHIFT
> > > goto parse_zk_args
> > >
> > > if this SHIFT is removed, then parse_zk_args works (and it does the
> shift
> > > itself). But the upconfig hangs, so still it does not work.
> > >
> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
> 4dc7576d8e
> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
> > issue.
> > > Would be great if this was fixed in the incoming 6.3...
> > >
> > > My cmd scripting is not too strong and I did not go further. I searched
> > > Jira but found nothing. By the way is it not possible to open tickets
> in
> > > Jira anymore?
> > >
> > > xavier
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
Correcting myself here, I was wrong about the cause (I had already messed
with the script).

I made it work by commenting out line 1261 (the number might be a bit off
as I have modified the script, but hopefully its easy to see where):

) ELSE IF "%1"=="/?" (
  goto zk_usage
) ELSE IF "%1"=="-h" (
  goto zk_usage
) ELSE IF "%1"=="-help" (
  goto zk_usage
) ELSE IF "!ZK_SRC!"=="" (
  if not "%~1"=="" (
goto set_zk_src
  )
 * rem goto zk_usage*
) ELSE IF "!ZK_DST!"=="" (
  IF "%ZK_OP%"=="cp" (
goto set_zk_dst
  )
  IF "%ZK_OP%"=="mv" (
goto set_zk_dst
  )
  set ZK_DST="_"
) ELSE IF NOT "%1"=="" (
  set ERROR_MSG="Unrecognized or misplaced zk argument %1%"

Now upconfig works!

thanks
xavier


On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav  wrote:

> hi,
>
> Am I missing something or this is broken in windows? I cannot upconfig,
> the scripts keeps exiting immediately and showing usage, as if I use some
> wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw it
> also on win7 (don't have that around anymore to try)
>
> I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:
>
> :set_zk_op
> set ZK_OP=%~1
> SHIFT
> goto parse_zk_args
>
> if this SHIFT is removed, then parse_zk_args works (and it does the shift
> itself). But the upconfig hangs, so still it does not work.
>
> this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e
> by Erick (not sure which one :) ) on July 2nd. Master still has this issue.
> Would be great if this was fixed in the incoming 6.3...
>
> My cmd scripting is not too strong and I did not go further. I searched
> Jira but found nothing. By the way is it not possible to open tickets in
> Jira anymore?
>
> xavier
>


'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
hi,

Am I missing something or this is broken in windows? I cannot upconfig, the
scripts keeps exiting immediately and showing usage, as if I use some wrong
parameters.  This is on win10, jdk8. But I am pretty sure I saw it also on
win7 (don't have that around anymore to try)

I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:

:set_zk_op
set ZK_OP=%~1
SHIFT
goto parse_zk_args

if this SHIFT is removed, then parse_zk_args works (and it does the shift
itself). But the upconfig hangs, so still it does not work.

this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e by
Erick (not sure which one :) ) on July 2nd. Master still has this issue.
Would be great if this was fixed in the incoming 6.3...

My cmd scripting is not too strong and I did not go further. I searched
Jira but found nothing. By the way is it not possible to open tickets in
Jira anymore?

xavier


Re: JNDI settings

2016-09-26 Thread xavier jmlucjav
I did set up JNDI for DIH once, and you have to tweak the jetty setup. Of
course, solr should have its own jetty instance, the old way of being just
a war is not true anymore. I don't remember where, but there should be some
instructions somewhere, it took me an afternoon to set it up fine.

xavier

On Wed, Sep 21, 2016 at 1:15 PM, Aristedes Maniatis 
wrote:

> On 13/09/2016 1:29am, Aristedes Maniatis wrote:
> > I am using Solr 5.5 and wanting to add JNDI settings to Solr (for data
> import). I'm new to Solr Cloud setup (previously I was running Solr running
> as a custom bundled war) so I can't figure where to put the JNDI settings
> with user/pass themselves.
> >
> > I don't want to add it to jetty.xml because that's part of the packaged
> application which will be upgraded from time to time.
> >
> > Should it go into solr.xml inside the solr.home directory? If so, what's
> the right syntax there?
>
>
> Just a follow up on this question. Does anyone know of how I can add JNDI
> settings to Solr without overwriting parts of the application itself?
>
> Cheers
> Ari
>
>
>
> --
> -->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>


Re: issue transplanting standalone core into solrcloud (plus upgrade)

2016-09-26 Thread xavi jmlucjav
I guess there is no other way than reindex:
- of course, not all fields are stored, that would have been too easy
- it might (??) work if as Jan says I build a custom solr version with
removed IntFields added etc, but going down this rabbithole sounds too
risky, too much work for what, not sure it would eventually work, specially
considering the last point:
- I did not get any response to this, but my understanding now is that you
cannot take a standalone solr core /data  (without a _version_ field) and
put that into solrcloud setup, as _version_ is needed.

xavier

On Mon, Sep 26, 2016 at 9:21 PM, Jan Høydahl  wrote:

> If all the fields in your current schema has stored=“true”, you can try to
> export
> the full index to an XML file which can then be imported into 6.1.
> If some fields are not stored you will only be able to recover the
> inverted index
> representation of that data, which may not be enough to recreate the
> original
> data (or in some cases maybe it is enough).
>
> If you share a copy of your old schema.xml we may be able to help.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 26. sep. 2016 kl. 20.39 skrev Shawn Heisey :
> >
> > On 9/26/2016 6:28 AM, xavi jmlucjav wrote:
> >> Yes, I had to change some fields, basically to use TrieIntField etc
> >> instead
> >> of the old IntField. I was assuming by using the IndexUpgrader to
> upgrade
> >> the data to 6.1, the older IntField would work with the new
> TrieIntField.
> >> But I have tried loading the upgraded data into a standalone 6.1 and I
> am
> >> hitting the same issue, so this is not related to _version_ field (more
> on
> >> that below). Forget about solrcloud for now, having an old 3.6 index,
> >> should it be possible to use IndexUpgrader and load it on 6.1? How would
> >> one need to handle IntFields etc?
> >
> > The only option when you change the class on a field in your schema is
> > to wipe the index and rebuild it.  TrieIntField uses a completely
> > different on-disk data format than IntField did.  The two formats simply
> > aren't compatible.  This is not a bug, it's a fundamental fact of Lucene
> > indexes.
> >
> > Lucene doesn't use a schema -- that's a Solr concept.  IndexUpgrader is
> > a Lucene program that doesn't know what kind of data each field
> > contains, it just reaches down into the old index format, grabs the
> > internal data in each field, and copies it to a new index using the new
> > format.  The internal data must still be consistent with the Lucene
> > program for the index to work in a new version.  When you're running
> > Solr, it uses the schema to know how to read the index.
> >
> > In 5.x and 6.x, IntField does not exist, and attempting to read that
> > data using TrieIntField will not work.
> >
> > The luceneMatchVersion setting in solrconfig.xml can cause certain
> > components (tokenizers and filters mainly) to revert to old behavior in
> > the previous major version.  Version 6.x doesn't hold onto behavior from
> > 3.x and 4.x -- it can only revert behavior back to 5.x versions.
> >
> > The luceneMatchVersion setting cannot bring back removed classes like
> > IntField, and it does NOT affect the on-disk index format.
> >
> > Your particular situation will require a full reindex.  It is not
> > possible to upgrade an index using those old class types.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: issue transplanting standalone core into solrcloud (plus upgrade)

2016-09-26 Thread xavi jmlucjav
Hi Shawn/Jan,

On Sun, Sep 25, 2016 at 6:18 PM, Shawn Heisey  wrote:

> On 9/25/2016 4:24 AM, xavi jmlucjav wrote:
> > Everything went well, no errors when solr restarted, the collections
> shows
> > the right number of docs. But when I try to run a query, I get:
> >
> > null:java.lang.NullPointerException
>
> Did you change any of the fieldType class values as you adjusted the
> schema for the upgrade?  A number of classes that were valid and
> deprecated in 3.6 and 4.x were completely removed by 5.x, and 6.x
> probably removed a few more.
>

Yes, I had to change some fields, basically to use TrieIntField etc instead
of the old IntField. I was assuming by using the IndexUpgrader to upgrade
the data to 6.1, the older IntField would work with the new TrieIntField.
But I have tried loading the upgraded data into a standalone 6.1 and I am
hitting the same issue, so this is not related to _version_ field (more on
that below). Forget about solrcloud for now, having an old 3.6 index,
should it be possible to use IndexUpgrader and load it on 6.1? How would
one need to handle IntFields etc?



>
> If you did make changes like this to your schema, then what's in the
> index will no longer match the schema, and the *only* option is a
> reindex.  Exceptions are likely if you don't reindex after schema
> changes to the class value(s) or the index analyzer(s).
>
> Regarding the _version_ field:  SolrCloud expects this field to be in
> your schema.  It might also expect that that every document in the index
> will already contain a value in this field.  Adding _version_ to your
> schema should be treated similarly to the changes mentioned above -- a
> reindex is required for proper operation.
>
> Even if the schema didn't change in a way that *requires* a reindex ...
> the number of changes to the analysis components across three major
> version jumps is quite large.  Solr might not work as expected because
> of those changes unless you reindex, even if you don't see any
> exceptions.  Changes to your schema because of changes in analysis
> component behavior might  be required -- which is another situation that
> usually requires a reindex.
>
> Because of these potential problems, I always start a new Solr version
> with no index data and completely rebuild my indexes after an upgrade.
> That is the best way to ensure success.
>

I am totally aware of all the advantages of reindexing, sure. And that is
what I always do, this time thought, seems the original data is not
available...


> You referenced a mailing list thread where somebody had success
> converting non-cloud to cloud... but that was on version 4.8.1, two
> major versions back from the version you're running.  They also did not
> upgrade major versions -- from some things they said at the beginning of
> the thread, I know that the source version was at least 4.4.  The thread
> didn't mention any schema changes, either.
>
> If the schema doesn't change at all, moving from non-cloud to cloud is
> very possible, but if the schema changes, the index data might not match
> the schema any more, and that situation will not work.
>
Since you jumped three major versions, it's almost guaranteed that your
> schema *did* change, and the changes may have been more extensive than
> just adding the _version_ field.
>
> It's possible that there's a problem when converting a non-cloud install
> with no _version_ field to a cloud install where the only schema change
> is adding the _version_ field.  We can treat THAT situation as a bug,
> but if there are other schema changes besides adding _version_, the
> exception you encountered is most likely not a bug.
>


The are two orthogonal issues here:
A. moving to solrcloud from  standalone without reindexing. And without
having a _version_ field already indexed, of course. Is this even possible?
>From the thread above, I understood it was possible, but you say that
solrcloud expects _version_ to be there, with values, so this makes this
move totally impossible without a reindexing. This should be made clear
somewhere in the doc. I understand it is not a frequent scenario, but will
be a deal breaker when it happens. So far the only thing I found is the
aforementioned thread, that if I am not misreading, makes it sound as it
will work ok.

B. upgrading from a very old 3.6 version to 6.1 without reindexing: it
seems like I am hitting an issue with this first. Even if this was
resolved, I would not be able to achieve my goal due A, but would be good
to know how to get this done too, if possible.

Jan: I tried tweaking luceneMatchVersion too, no luck though.
xavier


>
> Thanks,
> Shawn
>
>


issue transplanting standalone core into solrcloud (plus upgrade)

2016-09-25 Thread xavi jmlucjav
Hi,

I have an existing 3.6 standalone installation. It has to be moved to
Solrcloud 6.1.0. Reindexing is not an option, so I did the following:

- Use IndexUpgrader to upgrade 3.6 -> 4.4 -> 5.5. I did not upgrade to 6.X
as 5.5 should be readable by 6.x
- Install solrcloud 6.1 cluster
- modify schema/solrconfig for cloud support (add _version_, tlog etc)
- follow the method mentioned here
http://lucene.472066.n3.nabble.com/Copy-existing-index-from-standalone-Solr-to-Solr-cloud-td4149920.html
I did not find any other doc on how to transplant a standalone core int
solrcloud

Everything went well, no errors when solr restarted, the collections shows
the right number of docs. But when I try to run a query, I get:

null:java.lang.NullPointerException
at
org.apache.lucene.util.LegacyNumericUtils.prefixCodedToLong(LegacyNumericUtils.java:189)
at org.apache.solr.schema.TrieField.toObject(TrieField.java:155)
at org.apache.solr.schema.TrieField.write(TrieField.java:324)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:133)
at
org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:345)
at
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:249)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)

I was wondering how the non existance of the _version_ field would be
handled, but as that thread above said it would work.
Can anyone shed some light?

thanks


Re: cursorMark and CSVResponseWriter for mass reindex

2016-06-20 Thread xavi jmlucjav
Hi Erick,

Ah, yes I guess you are correct in that could just avoid using cursorMark
this way...the only (smallish I think) issue is that I would need to
extract the last id from the csv output. Oh and that I am using Datastaxx
DSE, so uniqueKey is a combination of two fields...but I think I can manage
to use a field I now it's unique, even if its not uniqueKey.

thanks!
Xavier



On Tue, Jun 21, 2016 at 2:13 AM, Erick Erickson 
wrote:

> The CursorMark stuff has to deal with shards, what happens when more
> than one document on different shards has the same sort value, what
> if all the docs in the response packet have the same sort value, what
> happens when you want to return docs by score and the like.
>
> For your case you can use a sort criteria that avoids all these issues and
> be
> OK. You can think of it as a specialized CursorMark.
>
> You should be able to just sort by
>  and send each query through with a range filter query,
> so the first query would look something like (assuming "id" is your
> )
>
> q=*:*&sort=id asc&start=0&rows=1000
> then the rest would be
> q=*:*&sort=id asc&fq={!cache=false}id:[last_id_returned_from_previous_query
> TO *]&start=0&rows=1000
>
> this avoids the "deep paging" problem that CursorMark solves more cheaply
> because the  guarantees that there is one and only one doc with
> that value. Note that the start parameter is always 0.
>
> Or your second query could even be just
> q=id:[last_id_returned_from_previous_query TO *]&sort=id
> asc&start=0&rows=1000
>
> Best,
> Erick
>
> On Mon, Jun 20, 2016 at 12:37 PM, xavi jmlucjav 
> wrote:
> > Hi,
> >
> > I need to index into a new schema 800M docs, that exist in an older solr.
> > As all fields are stored, I thought I was very lucky as I could:
> >
> > - use wt=csv
> > - combined with cursorMark
> >
> > to easily script out something that would export/index in chunks of 1M
> docs
> > or something. CVS output being very efficient for this sort of thing, I
> > think.
> >
> > But, sadly I found that there is no way to get the nextcursorMark after
> the
> > first request, as the csvwriter just outputs plailn csv info of the
> fields,
> > excluding all other info on the response!!!
> >
> > This is so unfortunate, as csv/cursorMark seem like the perfect fit to
> > reindex this huge index (it's a one time thing).
> >
> > Does anyone see some way to still be able to use this? I would prefer not
> > having to write some java code just to get the nextcursorMark.
> >
> > So far I thought of:
> > - use json, but I need to postprocess returned json to remove the
> response
> > info etc, before reindexing, a pain.
> > - send two calls for each chunk (sending the same cursormark both times),
> > one wt=csv to get the data, another wt=json to get cursormark (and ignore
> > the data, maybe using fl=id only to avoid getting much data). I did some
> > test and this seems should work.
> >
> > I guess I will go with the 2nd, but anyone has a better idea?
> > thanks
> > xavier
>


cursorMark and CSVResponseWriter for mass reindex

2016-06-20 Thread xavi jmlucjav
Hi,

I need to index into a new schema 800M docs, that exist in an older solr.
As all fields are stored, I thought I was very lucky as I could:

- use wt=csv
- combined with cursorMark

to easily script out something that would export/index in chunks of 1M docs
or something. CVS output being very efficient for this sort of thing, I
think.

But, sadly I found that there is no way to get the nextcursorMark after the
first request, as the csvwriter just outputs plailn csv info of the fields,
excluding all other info on the response!!!

This is so unfortunate, as csv/cursorMark seem like the perfect fit to
reindex this huge index (it's a one time thing).

Does anyone see some way to still be able to use this? I would prefer not
having to write some java code just to get the nextcursorMark.

So far I thought of:
- use json, but I need to postprocess returned json to remove the response
info etc, before reindexing, a pain.
- send two calls for each chunk (sending the same cursormark both times),
one wt=csv to get the data, another wt=json to get cursormark (and ignore
the data, maybe using fl=id only to avoid getting much data). I did some
test and this seems should work.

I guess I will go with the 2nd, but anyone has a better idea?
thanks
xavier


Re: issues using BlendedInfixLookupFactory in solr5.5

2016-06-02 Thread jmlucjav
hey Arcadius,

sorry I missed your reply and just saw it now. Thanks for the answers! I
will need to use some of those advanced settings for the suggesters, so
I'll have more questions/comments, and hopefully some fixes too (for
example for SOLR-8928 if I have the time)

xavi

On Thu, May 12, 2016 at 12:30 AM, Arcadius Ahouansou 
wrote:

> Hi Xavi.
>
> The blenderType=linear not working has been introduced in
> https://issues.apache.org/jira/browse/LUCENE-6939
>
> "linear" has been refactored to "position_linear"
>
> I would be grateful if a committer could help update the wiki with the
> comments at
>
>
>
> https://issues.apache.org/jira/browse/LUCENE-6939?focusedCommentId=15068054#comment-15068054
>
>
> About your question:
> "does SolrCloud totally support suggesters?"
> Yes, SolrCloud supports the BlendedInfixSuggester to some extend.
> What worked for us was buildOnCommit=true
>
> We used 2 collections one is live, the other one is in stand-by mode.
> We update the stand-by one in batches and we commit at the end...
> triggering the suggester rebuilt
> Then we swap the stand-by to become the live collection using aliases.
>
>
> Arcadius
>
>
> On 31 March 2016 at 18:04, xavi jmlucjav  wrote:
>
> > Hi,
> >
> > I have been working with
> > AnalyzingInfixLookupFactory/BlendedInfixLookupFactory in 5.5.0, and I
> have
> > a number of questions/comments, hopefully I get some insight into this:
> >
> > - Doc not complete/up-to-date:
> > - blenderType param does not accept 'linear' value, it did in 5.3. I
> > commented it out as it's the default.
> > - it should be mentioned contextField must be a stored field
> > - if the field used is whitespace tokenized, and you search for 'one t',
> > the suggestions are sorted by weight, not score. So if you give a
> constant
> > score to all docs, you might get this:
> > 1. one four two
> > 2. one two four
> >   Would taking the score into account (something not done yet but could
> be
> > done according to something I saw in code/jira) return 2,1 instead of
> 1,2?
> > My guess is it would, correct?
> > - what would we need to return the score too? Could it be done easily?
> > along with the payload or something.
> > - would it be possible to make BlendedInfixLookupFactory allow for some
> > fuzziness a la FuzzyLookupFactory?
> > - when building a big suggester, it can take a long time, you just send a
> > request with suggest.build=true and wait. Is there any possible way to
> > monitor the progress of this? I did not find one.
> > - for weightExpression, one typical use case would be to provide the
> users'
> > lat/lon to weight the suggestions by proximity, is this somehow feasible?
> > What would be needed?
> > - does SolrCloud totally support suggesters? If so does each shard build
> > its own suggester and it works just like a normal distributed search ?
> > - I filled SOLR-8928 suggest.cfq does not work with
> > DocumentExpressionDictionaryFactory/weightExpression as I found that
> combo
> > not working.
> >
> > regards
> > xavi
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Applied Knowledge Is Power
> M: 07908761999
> W: www.menelic.com
> ---
>


issues using BlendedInfixLookupFactory in solr5.5

2016-03-31 Thread xavi jmlucjav
Hi,

I have been working with
AnalyzingInfixLookupFactory/BlendedInfixLookupFactory in 5.5.0, and I have
a number of questions/comments, hopefully I get some insight into this:

- Doc not complete/up-to-date:
- blenderType param does not accept 'linear' value, it did in 5.3. I
commented it out as it's the default.
- it should be mentioned contextField must be a stored field
- if the field used is whitespace tokenized, and you search for 'one t',
the suggestions are sorted by weight, not score. So if you give a constant
score to all docs, you might get this:
1. one four two
2. one two four
  Would taking the score into account (something not done yet but could be
done according to something I saw in code/jira) return 2,1 instead of 1,2?
My guess is it would, correct?
- what would we need to return the score too? Could it be done easily?
along with the payload or something.
- would it be possible to make BlendedInfixLookupFactory allow for some
fuzziness a la FuzzyLookupFactory?
- when building a big suggester, it can take a long time, you just send a
request with suggest.build=true and wait. Is there any possible way to
monitor the progress of this? I did not find one.
- for weightExpression, one typical use case would be to provide the users'
lat/lon to weight the suggestions by proximity, is this somehow feasible?
What would be needed?
- does SolrCloud totally support suggesters? If so does each shard build
its own suggester and it works just like a normal distributed search ?
- I filled SOLR-8928 suggest.cfq does not work with
DocumentExpressionDictionaryFactory/weightExpression as I found that combo
not working.

regards
xavi


Re: Stopping Solr JVM on OOM

2016-03-19 Thread xavi jmlucjav
In order to force a OOM do this:

- index a sizable amount of docs with normal -Xmx, if you already have 350k
docs indexed, that should be enough
- now, stop solr and decrease memory, like -Xmx=15m, start it, and run a
query with a facet on a field with very high cardinality, ask for all
facets. If not enough, add another facet field etc. This is a sure way to
get OOM

On Mon, Mar 14, 2016 at 9:42 AM, Binoy Dalal  wrote:

> I set the heap to 16 mb and tried to index about 350k records using a DIH.
> This did throw an OOM for that particular thread in the console, but the
> oom script wasn't called and solr was running properly.
> Moreover, solr also managed to index all 350k records.
>
> Is this the correct way to o about getting solr to throw an oom?
> If so where did I go wrong?
> If not, what other alternative is there?
>
> Thanks.
>
> PS. I tried to start solr with really low memory (abt. 2k) but that just
> threw an error saying too small a heap and the JVM didn't start at all.
>
> On Mon, 14 Mar 2016, 07:57 Shawn Heisey,  wrote:
>
> > On 3/13/2016 8:13 PM, Binoy Dalal wrote:
> > > I made the necessary changes to that oom script?
> > > How does it look now?
> > > Also can you suggest some way of testing it with solr?
> > > How do I make solr oom on purpose?
> >
> > Set the java heap really small.  Not entirely sure what value to use.
> > I'd probably start with 32m and work my way down.  With a small enough
> > heap, you could probably produce OOM without even trying to USE Solr.
> >
> > Thanks,
> > Shawn
> >
> > --
> Regards,
> Binoy Dalal
>


Re: How is Tika used with Solr

2016-02-12 Thread xavi jmlucjav
Of course, but that code is very tricky, so if the extraction library takes
care of all that, it's a huge gain. The Aperture library I used worked very
well in that regard, and even though it did not use processes as Timothy
says, it never got stuck if I remember correctly.

On Fri, Feb 12, 2016 at 1:46 AM, Erick Erickson 
wrote:

> Well, I'd imagine you could spawn threads and monitor/kill them as
> necessary, although that doesn't deal with OOM errors
>
> FWIW,
> Erick
>
> On Thu, Feb 11, 2016 at 3:08 PM, xavi jmlucjav  wrote:
> > For sure, if I need heavy duty text extraction again, Tika would be the
> > obvious choice if it covers dealing with hangs. I never used tika-server
> > myself (not sure if it existed at the time) just used tika from my own
> jvm.
> >
> > On Thu, Feb 11, 2016 at 8:45 PM, Allison, Timothy B.  >
> > wrote:
> >
> >> x-post to Tika user's
> >>
> >> Y and n.  If you run tika app as:
> >>
> >> java -jar tika-app.jar  
> >>
> >> It runs tika-batch under the hood (TIKA-1330 as part of TIKA-1302).
> This
> >> creates a parent and child process, if the child process notices a hung
> >> thread, it dies, and the parent restarts it.  Or if your OS gets upset
> with
> >> the child process and kills it out of self preservation, the parent
> >> restarts the child, or if there's an OOM...and you can configure how
> often
> >> the child shuts itself down (with parental restarting) to mitigate
> memory
> >> leaks.
> >>
> >> So, y, if your use case allows  , then we now
> have
> >> that in Tika.
> >>
> >> I've been wanting to add a similar watchdog to tika-server ... any
> >> interest in that?
> >>
> >>
> >> -Original Message-
> >> From: xavi jmlucjav [mailto:jmluc...@gmail.com]
> >> Sent: Thursday, February 11, 2016 2:16 PM
> >> To: solr-user 
> >> Subject: Re: How is Tika used with Solr
> >>
> >> I have found that when you deal with large amounts of all sort of files,
> >> in the end you find stuff (pdfs are typically nasty) that will hang
> tika.
> >> That is even worse that a crash or OOM.
> >> We used aperture instead of tika because at the time it provided a
> >> watchdog feature to kill what seemed like a hanged extracting thread.
> That
> >> feature is super important for a robust text extracting pipeline. Has
> Tika
> >> gained such feature already?
> >>
> >> xavier
> >>
> >> On Wed, Feb 10, 2016 at 6:37 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > Timothy's points are absolutely spot-on. In production scenarios, if
> >> > you use the simple "run Tika in a SolrJ program" approach you _must_
> >> > abort the program on OOM errors and the like and  figure out what's
> >> > going on with the offending document(s). Or record the name somewhere
> >> > and skip it next time 'round. Or
> >> >
> >> > How much you have to build in here really depends on your use case.
> >> > For "small enough"
> >> > sets of documents or one-time indexing, you can get by with dealing
> >> > with errors one at a time.
> >> > For robust systems where you have to have indexing available at all
> >> > times and _especially_ where you don't control the document corpus,
> >> > you have to build something far more tolerant as per Tim's comments.
> >> >
> >> > FWIW,
> >> > Erick
> >> >
> >> > On Wed, Feb 10, 2016 at 4:27 AM, Allison, Timothy B.
> >> > 
> >> > wrote:
> >> > > I completely agree on the impulse, and for the vast majority of the
> >> > > time
> >> > (regular catchable exceptions), that'll work.  And, by vast majority,
> >> > aside from oom on very large files, we aren't seeing these problems
> >> > any more in our 3 million doc corpus (y, I know, small by today's
> >> > standards) from
> >> > govdocs1 and Common Crawl over on our Rackspace vm.
> >> > >
> >> > > Given my focus on Tika, I'm overly sensitive to the worst case
> >> > scenarios.  I find it encouraging, Erick, that you haven't seen these
> >> > types of problems, that users aren't complaining too often about
> >> > catastrophic failures of Tika within 

Re: How is Tika used with Solr

2016-02-11 Thread xavi jmlucjav
For sure, if I need heavy duty text extraction again, Tika would be the
obvious choice if it covers dealing with hangs. I never used tika-server
myself (not sure if it existed at the time) just used tika from my own jvm.

On Thu, Feb 11, 2016 at 8:45 PM, Allison, Timothy B. 
wrote:

> x-post to Tika user's
>
> Y and n.  If you run tika app as:
>
> java -jar tika-app.jar  
>
> It runs tika-batch under the hood (TIKA-1330 as part of TIKA-1302).  This
> creates a parent and child process, if the child process notices a hung
> thread, it dies, and the parent restarts it.  Or if your OS gets upset with
> the child process and kills it out of self preservation, the parent
> restarts the child, or if there's an OOM...and you can configure how often
> the child shuts itself down (with parental restarting) to mitigate memory
> leaks.
>
> So, y, if your use case allows  , then we now have
> that in Tika.
>
> I've been wanting to add a similar watchdog to tika-server ... any
> interest in that?
>
>
> -Original Message-
> From: xavi jmlucjav [mailto:jmluc...@gmail.com]
> Sent: Thursday, February 11, 2016 2:16 PM
> To: solr-user 
> Subject: Re: How is Tika used with Solr
>
> I have found that when you deal with large amounts of all sort of files,
> in the end you find stuff (pdfs are typically nasty) that will hang tika.
> That is even worse that a crash or OOM.
> We used aperture instead of tika because at the time it provided a
> watchdog feature to kill what seemed like a hanged extracting thread. That
> feature is super important for a robust text extracting pipeline. Has Tika
> gained such feature already?
>
> xavier
>
> On Wed, Feb 10, 2016 at 6:37 PM, Erick Erickson 
> wrote:
>
> > Timothy's points are absolutely spot-on. In production scenarios, if
> > you use the simple "run Tika in a SolrJ program" approach you _must_
> > abort the program on OOM errors and the like and  figure out what's
> > going on with the offending document(s). Or record the name somewhere
> > and skip it next time 'round. Or
> >
> > How much you have to build in here really depends on your use case.
> > For "small enough"
> > sets of documents or one-time indexing, you can get by with dealing
> > with errors one at a time.
> > For robust systems where you have to have indexing available at all
> > times and _especially_ where you don't control the document corpus,
> > you have to build something far more tolerant as per Tim's comments.
> >
> > FWIW,
> > Erick
> >
> > On Wed, Feb 10, 2016 at 4:27 AM, Allison, Timothy B.
> > 
> > wrote:
> > > I completely agree on the impulse, and for the vast majority of the
> > > time
> > (regular catchable exceptions), that'll work.  And, by vast majority,
> > aside from oom on very large files, we aren't seeing these problems
> > any more in our 3 million doc corpus (y, I know, small by today's
> > standards) from
> > govdocs1 and Common Crawl over on our Rackspace vm.
> > >
> > > Given my focus on Tika, I'm overly sensitive to the worst case
> > scenarios.  I find it encouraging, Erick, that you haven't seen these
> > types of problems, that users aren't complaining too often about
> > catastrophic failures of Tika within Solr Cell, and that this thread
> > is not yet swamped with integrators agreeing with me. :)
> > >
> > > However, because oom can leave memory in a corrupted state (right?),
> > because you can't actually kill a thread for a permanent hang and
> > because Tika is a kitchen sink and we can't prevent memory leaks in
> > our dependencies, one needs to be aware that bad things can
> > happen...if only very, very rarely.  For a fellow traveler who has run
> > into these issues on massive data sets, see also [0].
> > >
> > > Configuring Hadoop to work around these types of problems is not too
> > difficult -- it has to be done with some thought, though.  On
> > conventional single box setups, the ForkParser within Tika is one
> > option, tika-batch is another.  Hand rolling your own parent/child
> > process is non-trivial and is not necessary for the vast majority of use
> cases.
> > >
> > >
> > > [0]
> > http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-w
> > eb-content-nanite/
> > >
> > >
> > >
> > > -Original Message-
> > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > Sent: Tuesday, February 09, 2016 10

Re: How is Tika used with Solr

2016-02-11 Thread xavi jmlucjav
I have found that when you deal with large amounts of all sort of files, in
the end you find stuff (pdfs are typically nasty) that will hang tika. That
is even worse that a crash or OOM.
We used aperture instead of tika because at the time it provided a watchdog
feature to kill what seemed like a hanged extracting thread. That feature
is super important for a robust text extracting pipeline. Has Tika gained
such feature already?

xavier

On Wed, Feb 10, 2016 at 6:37 PM, Erick Erickson 
wrote:

> Timothy's points are absolutely spot-on. In production scenarios, if
> you use the simple
> "run Tika in a SolrJ program" approach you _must_ abort the program on
> OOM errors
> and the like and  figure out what's going on with the offending
> document(s). Or record the
> name somewhere and skip it next time 'round. Or
>
> How much you have to build in here really depends on your use case.
> For "small enough"
> sets of documents or one-time indexing, you can get by with dealing
> with errors one at a time.
> For robust systems where you have to have indexing available at all
> times and _especially_
> where you don't control the document corpus, you have to build
> something far more
> tolerant as per Tim's comments.
>
> FWIW,
> Erick
>
> On Wed, Feb 10, 2016 at 4:27 AM, Allison, Timothy B. 
> wrote:
> > I completely agree on the impulse, and for the vast majority of the time
> (regular catchable exceptions), that'll work.  And, by vast majority, aside
> from oom on very large files, we aren't seeing these problems any more in
> our 3 million doc corpus (y, I know, small by today's standards) from
> govdocs1 and Common Crawl over on our Rackspace vm.
> >
> > Given my focus on Tika, I'm overly sensitive to the worst case
> scenarios.  I find it encouraging, Erick, that you haven't seen these types
> of problems, that users aren't complaining too often about catastrophic
> failures of Tika within Solr Cell, and that this thread is not yet swamped
> with integrators agreeing with me. :)
> >
> > However, because oom can leave memory in a corrupted state (right?),
> because you can't actually kill a thread for a permanent hang and because
> Tika is a kitchen sink and we can't prevent memory leaks in our
> dependencies, one needs to be aware that bad things can happen...if only
> very, very rarely.  For a fellow traveler who has run into these issues on
> massive data sets, see also [0].
> >
> > Configuring Hadoop to work around these types of problems is not too
> difficult -- it has to be done with some thought, though.  On conventional
> single box setups, the ForkParser within Tika is one option, tika-batch is
> another.  Hand rolling your own parent/child process is non-trivial and is
> not necessary for the vast majority of use cases.
> >
> >
> > [0]
> http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-web-content-nanite/
> >
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, February 09, 2016 10:05 PM
> > To: solr-user 
> > Subject: Re: How is Tika used with Solr
> >
> > My impulse would be to _not_ run Tika in its own JVM, just catch any
> exceptions in my code and "do the right thing". I'm not sure I see any real
> benefit in yet another JVM.
> >
> > FWIW,
> > Erick
> >
> > On Tue, Feb 9, 2016 at 6:22 PM, Allison, Timothy B. 
> wrote:
> >> I have one answer here [0], but I'd be interested to hear what Solr
> users/devs/integrators have experienced on this topic.
> >>
> >> [0]
> >> http://mail-archives.apache.org/mod_mbox/tika-user/201602.mbox/%3CCY1P
> >> R09MB0795EAED947B53965BC86874C7D70%40CY1PR09MB0795.namprd09.prod.outlo
> >> ok.com%3E
> >>
> >> -Original Message-
> >> From: Steven White [mailto:swhite4...@gmail.com]
> >> Sent: Tuesday, February 09, 2016 6:33 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: How is Tika used with Solr
> >>
> >> Thank you Erick and Alex.
> >>
> >> My main question is with a long running process using Tika in the same
> JVM as my application.  I'm running my file-system-crawler in its own JVM
> (not Solr's).  On Tika mailing list, it is suggested to run Tika's code in
> it's own JVM and invoke it from my file-system-crawler using
> Runtime.getRuntime().exec().
> >>
> >> I fully understand from Alex suggestion and link provided by Erick to
> use Tika outside Solr.  But what about using Tika within the same JVM as my
> file-system-crawler application or should I be making a system call to
> invoke another JAR, that runs in its own JVM to extract the raw text?  Are
> there known issues with Tika when used in a long running process?
> >>
> >> Steve
> >>
> >>
>


Re: Json Facet api on nested doc

2015-11-24 Thread xavi jmlucjav
Mikahil, Yonik

thanks for having a look. This was my bad all the time...I forgot I was on
5.2.1 instead of 5.3.1 on this setup!! It seems some things were not there
yet on 5.2.1, I just upgraded to 5.3.1 and my query works perfectly.

Although I do agree with Mikhail the docs on this feature are a bit light,
it is understandable though, as it is quite new.

thanks
xavi

On Mon, Nov 23, 2015 at 9:24 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Indeed! Now it works for me too. JSON Facets seems powerful, but not
> friendly to me.
> Yonik, thanks for example!
>
> Xavi,
>
> I took  json docs from http://yonik.com/solr-nested-objects/ and just
> doubled book2_c3
>
> Here is what I have with json.facet={catz: {type:terms,field:cat_s,
> facet:{ starz:{type:terms, field:stars_i,
> domain:{blockChildren:'type_s:book'}} }}}
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "publisher_s:*",
>   "json.facet": "{catz: {type:terms,field:cat_s, facet:{ 
> starz:{type:terms, field:stars_i, domain:{blockChildren:'type_s:book'}} }}}",
>   "indent": "true",
>   "wt": "json",
>   "_": "1448309900982"
> }
>   },
>   "response": {
> "numFound": 2,
> "start": 0,
> "docs": [
>   {
> "id": "book1",
> "type_s": "book",
> "title_t": "The Way of Kings",
> "author_s": "Brandon Sanderson",
> "cat_s": "fantasy",
> "pubyear_i": 2010,
> "publisher_s": "Tor",
> "_version_": 1518570756086169600
>   },
>   {
> "id": "book2",
> "type_s": "book",
> "title_t": "Snow Crash",
> "author_s": "Neal Stephenson",
> "cat_s": "sci-fi",
> "pubyear_i": 1992,
> "publisher_s": "Bantam",
> "_version_": 1518570908026929200
>   }
> ]
>   },
>   "facets": {
> "count": 2,
> "catz": {
>   "buckets": [
> {
>   "val": "fantasy",
>   "count": 1,
>   "starz": {
> "buckets": [
>   {
> "val": 3,
> "count": 1
>   },
>   {
> "val": 5,
> "count": 1
>   }
> ]
>   }
> },
> {
>   "val": "sci-fi",
>   "count": 1,
>   "starz": {
> "buckets": [
>   {
> "val": 2,
> "count": 2
>   },
>   {
> "val": 4,
> "count": 1
>   },
>   {
> "val": 5,
> "count": 1
>   }
> ]
>   }
> }
>   ]
> }
>   }
> }
>
> It works well with *:* too.
>
>
> On Mon, Nov 23, 2015 at 12:56 AM, Yonik Seeley  wrote:
>
>> On Sun, Nov 22, 2015 at 3:10 PM, Mikhail Khludnev
>>  wrote:
>> > Hello,
>> >
>> > I also played with json.facet, but couldn't achieve the desired result
>> too.
>> >
>> > Yonik, Alessandro,
>> > Do you think it's a new feature or it can be achieved with the current
>> > implementation?
>>
>> Not sure if I'm misunderstanding the example, but it looks
>> straight-forward.
>>
>> terms facet on parent documents, with sub-facet on child documents.
>> I just committed a test for this, and it worked fine.  See
>> TestJsonFacets.testBlockJoin()
>>
>> Can we see an example of a parent document being indexed (i.e. along
>> with it's child documents)?
>>
>> -Yonik
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Json Facet api on nested doc

2015-11-19 Thread xavi jmlucjav
Hi,

I am trying to get some faceting with the json facet api on nested doc, but
I am having issues. Solr 5.3.1.

This query gest the buckets numbers ok:

curl http://shost:8983/solr/collection1/query -d 'q=*:*&rows=0&
 json.facet={
   yearly-salaries : {
type: terms,
field: salary,
domain: { blockChildren : "parent:true" }
  }
 }
'
Salary is a field in child docs only. But if I add another facet outside
it, the inner one returns no data:

curl http://shost:8983/solr/collection1/query -d 'q=*:*&rows=0&
 json.facet={
department:{
   type: terms,
   field: department,
   facet:{
   yearly-salaries : {
type: terms,
field: salary,
domain: { blockChildren : "parent:true" }
  }
  }
  }
 }
'
Results in:

"facets":{

 "count":3144071,

"department":{

"buckets":[{

"val":"Development",

"count":85707,

"yearly-salaries":{

"buckets":[]}},


department is field only in parent docs. Am I doing something wrong that I
am missing?
thanks
xavi


Schemaless mode and DIH

2015-08-06 Thread xavi jmlucjav
hi,

While working with DIH, I tried schemaless mode, and found out it does not
work if you are indexing with DIH. I could not find any issue or reference
to this in the mailing list, even if I found it a bit surprising nobody
tried that combination so far. Did anybody tested this before?

I managed to fix it for my small use case, I opened a ticket for it with
the patch https://issues.apache.org/jira/browse/SOLR-7882

thanks


BlendedInfixLookupFactory does not respect suggest.count in 5.2?

2015-06-07 Thread xavi jmlucjav
Hi,

I have a setup with AnalyzingInfixLookupFactory, suggest.count works. But
if I just replace:
s/AnalyzingInfixLookupFactory/BlendedInfixLookupFactory
suggest.count is not respected anymore, all suggestions are returned, so
making it virtually useless.

I am using RC4 that I believe is also being released.

xavi


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
On Sat, May 30, 2015 at 11:15 PM, Toke Eskildsen 
wrote:

> xavi jmlucjav  wrote:
> > I think the plan is to facet only on class_u1, class_u2 for queries from
> > user1, etc. So faceting would not happen on all fields on a single query.
>
> I understand that, but most of the created structures stays in memory
> between calls (DocValues helps here). Your heap will slowly fill up as more
> and more users perform faceted queries on their content.
>
got it...priceless info, thanks!


>
> - Toke Eskildsen
>


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
Thanks Toke for the input.

I think the plan is to facet only on class_u1, class_u2 for queries from
user1, etc. So faceting would not happen on all fields on a single query.
But still.

I did not design the schema, just found out about the number of fields and
advised again that, when they asked for a second opinion. We did not get to
discuss a different schema, but if we get to this point I will take that
plan into consideration for sure.

xavi

On Sat, May 30, 2015 at 10:17 PM, Toke Eskildsen 
wrote:

> xavi jmlucjav  wrote:
> > They reason for such a large number of fields:
> > - users create dynamically 'classes' of documents, say one user creates
> 10
> > classes on average
> > - for each 'class', the fields are created like this:
> "unique_id_"+fieldname
> > - there are potentially hundreds of thousands of users.
>
> Switch to a scheme where you control the names of fields outside of Solr,
> but share the fields internally:
>
> User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
> Internally they are mapped to class1, class2, class3, ... class10
>
> User 2 uses 2 classes: u2_horses, u2_elephants
> Internally they are mapped to class1, class2
>
> When User 2 queries field u2_horses, you rewrite the query to use class1
> instead.
>
> > There is faceting in each users' fields.
> > So this will result in >1M fields, very sparsely populated.
>
> If you are faceting on all of them and if you are not using DocValues,
> this will explode your memory requirements with vanilla Solr: UnInverted
> faceting maintains separate a map from all documentIDs to field values
> (ordinals for Strings) for _all_ the facet fields. Even if you only had 10
> million documents and even if your 1 million facet fields all had just 1
> value, represented by 1 bit, it would still require 10M * 1M * 1 bits in
> memory, which is 10 terabyte of RAM.
>
> - Toke Eskildsen
>


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
They reason for such a large number of fields:
- users create dynamically 'classes' of documents, say one user creates 10
classes on average
- for each 'class', the fields are created like this: "unique_id_"+fieldname
- there are potentially hundreds of thousands of users.

There is faceting in each users' fields.

So this will result in >1M fields, very sparsely populated. I warned them
this did not sound like a good design to me, but apparently someone very
knowledgeable in solr said this will work out fine. That is why I wanted to
double check...

On Sat, May 30, 2015 at 9:22 PM, Jack Krupansky 
wrote:

> Anything more than a few hundred seems very suspicious.
>
> Anything more than a few dozen or 50 or 75 seems suspicious as well.
>
> The point should not be how crazy can you get with Solr, but that craziness
> should be avoided altogether!
>
> Solr's design is optimal for a large number of relatively small documents,
> not large documents.
>
>
> -- Jack Krupansky
>
> On Sat, May 30, 2015 at 3:05 PM, Erick Erickson 
> wrote:
>
> > Nothing's really changed in that area lately. Your co-worker is
> > perhaps confusing the statement that "Solr has no a-priori limit on
> > the number of distinct fields that can be in a corpus" with supporting
> > an infinite number of fields. Not having a built-in limit is much
> > different than supporting
> >
> > Whether Solr breaks with thousands and thousands of fields is pretty
> > dependent on what you _do_ with those fields. Simply doing keyword
> > searches isn't going to put the same memory pressure on as, say,
> > faceting on them all (even if in different queries).
> >
> > I'd really ask why so many fields are necessary though.
> >
> > Best,
> > Erick
> >
> > On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav 
> wrote:
> > > Hi guys,
> > >
> > > someone I work with has been advised that currently Solr can support
> > > 'infinite' number of fields.
> > >
> > > I thought there was a practical limitation of say thousands of fields
> > (for
> > > sure less than a million), orthings can start to break (I think I
> > > remember seeings memory issues reported on the mailing list by several
> > > people).
> > >
> > >
> > > Was there any change I missed lately that makes having say 1M fields in
> > > Solr practical??
> > >
> > > thanks
> >
>


any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
Hi guys,

someone I work with has been advised that currently Solr can support
'infinite' number of fields.

I thought there was a practical limitation of say thousands of fields (for
sure less than a million), orthings can start to break (I think I
remember seeings memory issues reported on the mailing list by several
people).


Was there any change I missed lately that makes having say 1M fields in
Solr practical??

thanks


Re: Using a RequestHandler to expand query parameter

2014-09-09 Thread jmlucjav
this is easily doable by a custom (java code) request handler. If you want
to avoid writing any java code, you should investigate using
https://issues.apache.org/jira/browse/SOLR-5005 (I am myself going to have
a look at this interesting feature)

On Tue, Sep 9, 2014 at 4:33 PM, jimtronic  wrote:

> Never got a response on this ... Just looking for the best way to handle
> it?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157613.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: sample Cell schema question

2014-08-19 Thread jmlucjav
no it does not.

Here the intent, I think, is not to duplicate stored info, as other
metadata fields like author, keywords etc already are stored, if 'text' was
stored (text is where all fields: content, author etc are copyed), then it
would contain some duplicate info.


On Tue, Aug 19, 2014 at 1:05 PM, Aman Tandon 
wrote:

> I have a question, does storing the data in copyfields save space?
>
> With Regards
> Aman Tandon
>
>
> On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav  wrote:
>
> > ok, I had not noticed text contains also the other metadata like
> keywords,
> > description etc, nevermind!
> >
> >
> > On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav  wrote:
> >
> > > In the sample schema.xml I can see this:
> > >
> > > 
> > >  > > stored="true" multiValued="true"/>
> > >
> > >
> > > I am wondering, how does having this split in two fields text/content
> > save
> > > space?
> > >
> >
>


Re: sample Cell schema question

2014-08-19 Thread jmlucjav
ok, I had not noticed text contains also the other metadata like keywords,
description etc, nevermind!


On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav  wrote:

> In the sample schema.xml I can see this:
>
> 
>  stored="true" multiValued="true"/>
>
>
> I am wondering, how does having this split in two fields text/content save
> space?
>


sample Cell schema question

2014-08-19 Thread jmlucjav
In the sample schema.xml I can see this:





I am wondering, how does having this split in two fields text/content save
space?


Re: solr wiki: 'Support for Solr' page edit policy

2014-07-17 Thread jmlucjav
appreciated Stefan. Done updating.


On Thu, Jul 17, 2014 at 5:36 PM, Stefan Matheis 
wrote:

> Xavi
>
> It’s the former :) I’ve adding you to the contributors group
>
> -Stefan
>
>
> On Thursday, July 17, 2014 at 5:19 PM, jmlucjav wrote:
>
> > Hi guys,
> >
> > I don't remember anymore what is the policy to have someone added to this
> > page:
> >
> > - ask for edit rights and add your own line where needed
> > - send someone your line and they'll add it for you.
> >
> > If the former, could I get edit permissions for the wiki? My login is
> > jmlucjav. If the later, who could I send it to?
> >
> > thanks!
> > xavi
> >
> >
>
>
>


solr wiki: 'Support for Solr' page edit policy

2014-07-17 Thread jmlucjav
Hi guys,

I don't remember anymore what is the policy to have someone added to this
page:

- ask for edit rights and add your own line where needed
- send someone your line and they'll add it for you.

If the former, could I get edit permissions for the wiki? My login is
jmlucjav. If the later, who could I send it to?

thanks!
xavi


[ANN] sadat: generate fake docs for your Solr index

2014-03-17 Thread xavier jmlucjav
Hi,

A couple of times I found myself in the following situation: I had to work
on a Solr schema, but had no docs to index yet (the db was not ready etc).

In order to start learning js, I needed some small project to practice, so
I thought of this small utility. It allows you to generate fake docs to
index, so you can at least advance with the schema/solrconfig design.

Currently it allows (based on your current schema) to generate the most
basic field types (int, float, boolean, text, date), and user defined
functions can be plugged in for customized generation.

Have a look at https://github.com/jmlucjav/sadat


how to best convert some term in q to a fq

2013-12-23 Thread jmlucjav
Hi,

I have this scenario that I think is no unusual: solr will get a user
entered query string like 'apple pear france'.

I need to do this: if any of the terms is a country, then change the query
params to move that term to a fq, i.e:
q=apple pear france
to
q=apple pear&fq=country:france

What do you guys would be the best way to implement this?
- custom searchcomponent or queryparser
- servlet in same jetty as solr
- client code

To simplify, consider countries are just a single term.

Any pointer to an example to base this on would be great. thanks


Re: When is/should qf different from pf?

2013-10-29 Thread xavier jmlucjav
I am confused, wouldn't a doc that match both the phrase and the term
queries have a better score than a doc matching only the term score, even
if qf and pf are the same??


On Mon, Oct 28, 2013 at 7:54 PM, Upayavira  wrote:

> There'd be no point having them the same.
>
> You're likely to include boosts in your pf, so that docs that match the
> phrase query as well as the term query score higher than those that just
> match the term query.
>
> Such as:
>
>   qf=text description&pf=text^2 description^4
>
> Upayavira
>
> On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote:
> > Thanks Erick. Numeric fields make sense as I guess would strictly string
> > fields too since its one  term? In the normal text searching case though
> > does it make sense to have qf and pf differ?
> >
> > Thanks
> > Amit
> > On Oct 28, 2013 3:36 AM, "Erick Erickson" 
> > wrote:
> >
> > > The facetious answer is "when phrases aren't important in the fields".
> > > If you're doing a simple boolean match, adding phrase fields will add
> > > expense, to no good purpose etc. Phrases on numeric
> > > fields seems wrong.
> > >
> > > FWIW,
> > > Erick
> > >
> > >
> > > On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have been using Solr for years but never really stopped to wonder:
> > > >
> > > > When using the dismax/edismax handler, when do you have the qf
> different
> > > > from the pf?
> > > >
> > > > I have always set them to be the same (maybe different weights) but
> I was
> > > > wondering if there is a situation where you would have a field in
> the qf
> > > > not in the pf or vice versa.
> > > >
> > > > My understanding from the docs is that qf is a term-wise hard filter
> > > while
> > > > pf is a phrase-wise boost of documents who made it past the "qf"
> filter.
> > > >
> > > > Thanks!
> > > > Amit
> > > >
> > >
>


Re: Query Elevation Component

2013-06-05 Thread jmlucjav
davers wrote
> I want to elevate certain documents differently depending a a certain fq
> parameter in the request. I've read of somebody coding solr to do this but
> no code was shared. Where would I start looking to implement this feature
> myself?

Davers, 

I am also looking into this feature. Care to tell where did you see this
discussed? I could not find anything. Also, did you manage to implement this
somehow?

thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068308.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: do SearchComponents have access to response contents

2013-04-05 Thread xavier jmlucjav
I knew I could do that at jetty level with a servlet for instance, but the
user wants to do this stuff inside solr code itself. Now that you mention
the logs...that could be a solution without modifying the webapp...

thanks for the input!
xavier


On Fri, Apr 5, 2013 at 7:55 AM, Amit Nithian  wrote:

> "We need to also track the size of the response (as the size in bytes of
> the
> whole xml response tat is streamed, with stored fields and all). I was a
> bit worried cause I am wondering if a searchcomponent will actually have
> access to the response bytes..."
>
> ==> Can't you get this from your container access logs after the fact? I
> may be misunderstanding something but why wouldn't mining the Jetty/Tomcat
> logs for the response size here suffice?
>
> Thanks!
> Amit
>
>
> On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav 
> wrote:
>
> > A custom QueryResponseWriter...this makes sense, thanks Jack
> >
> >
> > On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky  > >wrote:
> >
> > > The search components can see the "response" as a namedlist, but it is
> > > only when SolrDispatchFIlter calls the QueryResponseWriter that XML or
> > JSON
> > > or whatever other format (Javabin as well) is generated from the named
> > list
> > > for final output in an HTTP response.
> > >
> > > You probably want a custom query response writer that wraps the XML
> > > response writer. Then you can generate the XML and then do whatever you
> > > want with it.
> > >
> > > The QueryResponseWriter class and  in
> > solrconfig.xml.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: xavier jmlucjav
> > > Sent: Wednesday, April 03, 2013 4:22 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: do SearchComponents have access to response contents
> > >
> > >
> > > I need to implement some SearchComponent that will deal with metrics on
> > the
> > > response. Some things I see will be easy to get, like number of hits
> for
> > > instance, but I am more worried with this:
> > >
> > > We need to also track the size of the response (as the size in bytes of
> > the
> > > whole xml response tat is streamed, with stored fields and all). I was
> a
> > > bit worried cause I am wondering if a searchcomponent will actually
> have
> > > access to the response bytes...
> > >
> > > Can someone confirm one way or the other? We are targeting Sorl4.0
> > >
> > > thanks
> > > xavier
> > >
> >
>


Re: do SearchComponents have access to response contents

2013-04-04 Thread xavier jmlucjav
A custom QueryResponseWriter...this makes sense, thanks Jack


On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky wrote:

> The search components can see the "response" as a namedlist, but it is
> only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON
> or whatever other format (Javabin as well) is generated from the named list
> for final output in an HTTP response.
>
> You probably want a custom query response writer that wraps the XML
> response writer. Then you can generate the XML and then do whatever you
> want with it.
>
> The QueryResponseWriter class and  in solrconfig.xml.
>
> -- Jack Krupansky
>
> -Original Message- From: xavier jmlucjav
> Sent: Wednesday, April 03, 2013 4:22 PM
> To: solr-user@lucene.apache.org
> Subject: do SearchComponents have access to response contents
>
>
> I need to implement some SearchComponent that will deal with metrics on the
> response. Some things I see will be easy to get, like number of hits for
> instance, but I am more worried with this:
>
> We need to also track the size of the response (as the size in bytes of the
> whole xml response tat is streamed, with stored fields and all). I was a
> bit worried cause I am wondering if a searchcomponent will actually have
> access to the response bytes...
>
> Can someone confirm one way or the other? We are targeting Sorl4.0
>
> thanks
> xavier
>


do SearchComponents have access to response contents

2013-04-03 Thread xavier jmlucjav
I need to implement some SearchComponent that will deal with metrics on the
response. Some things I see will be easy to get, like number of hits for
instance, but I am more worried with this:

We need to also track the size of the response (as the size in bytes of the
whole xml response tat is streamed, with stored fields and all). I was a
bit worried cause I am wondering if a searchcomponent will actually have
access to the response bytes...

Can someone confirm one way or the other? We are targeting Sorl4.0

thanks
xavier


Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Damn...I was obfuscated seeing the 14 there...I had naively thought that
term freq would not be stored in the doc, 1 would be stored, but I guess it
still stores the real value and then applies custom similarity at query
time.

That means changing to a custom similarity does not need reindexing right?

thanks for the help!
xavier


On Thu, Mar 21, 2013 at 5:26 PM, Chris Hostetter
wrote:

> : > public class NoTfSimilarity extends DefaultSimilarity {
> : > public float tf(float freq) {
> : > return freq > 0 ? 1.0f : 0.0f;
> : > }
> : > }
> ...
>
> : > But I still see tf=14 in my query??
> ...
> : > 1.0 = tf(freq=14.0), with freq of:
> : >   14.0 = termFreq=14.0
>
> pretty sure you are looking at the explanation of the *input* to your tf()
> function, not that the *output* is 1.0, just like in your function.
>
> Did you compare this to what you see using the DefaultSimilarity?
>
>
>
> -Hoss
>


Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Steve,

yes, as I already included (though maybe is not very visible) I have this
before  element:


I can see explain info is indeed different, for example I have [] instead
of [DefaultSimilarity]

thanks



On Thu, Mar 21, 2013 at 3:08 PM, Steve Rowe  wrote:

> Hi xavier,
>
> Have you set the global similarity to solr.SchemaSimilarityFactory?
>
> See <http://wiki.apache.org/solr/SchemaXml#Similarity>.
>
> Steve
>
> On Mar 21, 2013, at 9:44 AM, xavier jmlucjav  wrote:
>
> > Hi Felipe,
> >
> > I need to keep positions, that is why I cannot just use
> > omitTermFreqAndPositions
> >
> >
> > On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti  >wrote:
> >
> >> Do you really need a custom similarity?
> >> Did you try to put the attribute "omitTermFreqAndPositions" in your
> field?
> >>
> >> It could be:
> >>
> >>  >> indexed="true" stored="true"  multiValued="false" omitNorms="true" />
> >>
> >> http://wiki.apache.org/solr/SchemaXml
> >>
> >>
> >> On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav 
> >> wrote:
> >>
> >>> I have the following setup:
> >>>
> >>> >>> positionIncrementGap="100">
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> >>> stored="true"   multiValued="false" omitNorms="true" />
> >>>
> >>> I index my corpus, and I can see tf is as usual, in this doc is 14
> times
> >> in
> >>> this field:
> >>> 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
> >>> [DefaultSimilarity], result of:
> >>>  4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> >>>0.14165252 = queryWeight, product of:
> >>>  10.0 = boost
> >>>  8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>  0.0016648936 = queryNorm
> >>>31.834784 = fieldWeight in 440, product of:
> >>>  3.7416575 = tf(freq=14.0), with freq of:
> >>>14.0 = termFreq=14.0
> >>>  8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>  1.0 = fieldNorm(doc=440)
> >>>
> >>>
> >>> Then I modify my schema:
> >>>
> >>>
> >>> >>> positionIncrementGap="100">
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I just want to disable term freq > 1, so a term its either present or
> >> not.
> >>>
> >>> public class NoTfSimilarity extends DefaultSimilarity {
> >>>public float tf(float freq) {
> >>>return freq > 0 ? 1.0f : 0.0f;
> >>>}
> >>> }
> >>>
> >>> But I still see tf=14 in my query??
> >>> 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result
> of:
> >>>723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product
> of:
> >>>  85.08203 = queryWeight, product of:
> >>>10.0 = boost
> >>>8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>1.0 = queryNorm
> >>>  8.5082035 = fieldWeight in 440, product of:
> >>>1.0 = tf(freq=14.0), with freq of:
> >>>  14.0 = termFreq=14.0
> >>>8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>1.0 = fieldNorm(doc=440)
> >>>
> >>> anyone sees what I am missing?
> >>> I am on solr4.0
> >>>
> >>> thanks
> >>> xavier
> >>>
> >>
> >>
> >>
> >> --
> >> Felipe Lahti
> >> Consultant Developer - ThoughtWorks Porto Alegre
> >>
>
>


Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Hi Felipe,

I need to keep positions, that is why I cannot just use
omitTermFreqAndPositions


On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti wrote:

> Do you really need a custom similarity?
> Did you try to put the attribute "omitTermFreqAndPositions" in your field?
>
> It could be:
>
>  indexed="true" stored="true"  multiValued="false" omitNorms="true" />
>
> http://wiki.apache.org/solr/SchemaXml
>
>
> On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav 
> wrote:
>
> > I have the following setup:
> >
> >  > positionIncrementGap="100">
> > 
> > 
> > 
> > 
> > 
> >  > stored="true"   multiValued="false" omitNorms="true" />
> >
> > I index my corpus, and I can see tf is as usual, in this doc is 14 times
> in
> > this field:
> > 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
> > [DefaultSimilarity], result of:
> >   4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> > 0.14165252 = queryWeight, product of:
> >   10.0 = boost
> >   8.5082035 = idf(docFreq=30, maxDocs=56511)
> >   0.0016648936 = queryNorm
> > 31.834784 = fieldWeight in 440, product of:
> >   3.7416575 = tf(freq=14.0), with freq of:
> > 14.0 = termFreq=14.0
> >   8.5082035 = idf(docFreq=30, maxDocs=56511)
> >   1.0 = fieldNorm(doc=440)
> >
> >
> > Then I modify my schema:
> >
> > 
> >  > positionIncrementGap="100">
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > I just want to disable term freq > 1, so a term its either present or
> not.
> >
> > public class NoTfSimilarity extends DefaultSimilarity {
> > public float tf(float freq) {
> > return freq > 0 ? 1.0f : 0.0f;
> > }
> > }
> >
> > But I still see tf=14 in my query??
> > 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
> > 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> >   85.08203 = queryWeight, product of:
> > 10.0 = boost
> > 8.5082035 = idf(docFreq=30, maxDocs=56511)
> > 1.0 = queryNorm
> >   8.5082035 = fieldWeight in 440, product of:
> > 1.0 = tf(freq=14.0), with freq of:
> >   14.0 = termFreq=14.0
> > 8.5082035 = idf(docFreq=30, maxDocs=56511)
> > 1.0 = fieldNorm(doc=440)
> >
> > anyone sees what I am missing?
> > I am on solr4.0
> >
> > thanks
> > xavier
> >
>
>
>
> --
> Felipe Lahti
> Consultant Developer - ThoughtWorks Porto Alegre
>


custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
I have the following setup:









I index my corpus, and I can see tf is as usual, in this doc is 14 times in
this field:
4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
[DefaultSimilarity], result of:
  4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
0.14165252 = queryWeight, product of:
  10.0 = boost
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  0.0016648936 = queryNorm
31.834784 = fieldWeight in 440, product of:
  3.7416575 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = fieldNorm(doc=440)


Then I modify my schema:










I just want to disable term freq > 1, so a term its either present or not.

public class NoTfSimilarity extends DefaultSimilarity {
public float tf(float freq) {
return freq > 0 ? 1.0f : 0.0f;
}
}

But I still see tf=14 in my query??
723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
  85.08203 = queryWeight, product of:
10.0 = boost
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = queryNorm
  8.5082035 = fieldWeight in 440, product of:
1.0 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = fieldNorm(doc=440)

anyone sees what I am missing?
I am on solr4.0

thanks
xavier


Re: 4.0 hanging on startup on Windows after Control-C

2013-03-18 Thread xavier jmlucjav
Hi Shawn,

I am using DIH with commit at the end...I'll investigate further to see if
this is what is happening and will report back, also will check 4.2 (that I
had to do anyway...).
thanks for your input
xavier


On Mon, Mar 18, 2013 at 6:12 PM, Shawn Heisey  wrote:

> On 3/17/2013 11:51 AM, xavier jmlucjav wrote:
>
>> Hi,
>>
>> I have an index where, if I kill solr via Control-C, it consistently hangs
>> next time I start it. Admin does not show cores, and searches never
>> return.
>> If I delete the index contents and I restart again all is ok. I am on
>> windows 7, jdk1.7 and Solr4.0.
>> Is this a known issue? I looked in jira but found nothing.
>>
>
> I scanned your thread dump.  Nothing jumped out at me, but given my
> inexperience with such things, I'm not surprised by that.
>
> Have you tried 4.1 or 4.2 yet to see if the problem persists?  4.0 is no
> longer the new hotness.
>
> Below I will discuss the culprit that springs to mind, though I don't know
> whether it's what you are actually hitting.
>
> One thing that can make Solr take a really long time to start up is huge
> transaction logs.  Transaction logs must be replayed when Solr starts, and
> if they are huge, it can take a really long time.
>
> Do you have tlog directories in your cores (in the data dir, next to the
> index directory), and if you do, how much disk space do they use?  The
> example config in 4.x has updateLog turned on.
>
> There are two common situations that can lead to huge transaction logs.
>  One is exclusively using soft commits when indexing, the other is running
> a very large import with the dataimport handler and not committing until
> the very end.
>
> AutoCommit with openSearcher=false is a good solution to both of these
> situations.  As long as you use openSearcher=false, it will not change what
> documents are visible.  AutoCommit does a regular "hard" commit every X new
> documents or every Y milliseconds.  A hard commit flushes index data to
> disk and starts a new transaction log.  Solr will only keep a few
> transaction logs around, so frequently building new ones keeps their size
> down.  When you restart Solr, you don't need to wait for a long time while
> it replays them.
>
> Thanks,
> Shawn
>
>


4.0 hanging on startup on Windows after Control-C

2013-03-17 Thread xavier jmlucjav
Hi,

I have an index where, if I kill solr via Control-C, it consistently hangs
next time I start it. Admin does not show cores, and searches never return.
If I delete the index contents and I restart again all is ok. I am on
windows 7, jdk1.7 and Solr4.0.
Is this a known issue? I looked in jira but found nothing.
xavier

Here is a thread dump:

2013-03-17 17:58:33
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode):

"JMX server connection timeout 30" daemon prio=6 tid=0x0bbf9000
nid=0x3b4c in Object.wait() [0x1df3e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xe7054338> (a [I)
at
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:168)
- locked <0xe7054338> (a [I)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

"RMI Scheduler(0)" daemon prio=6 tid=0x0bbf8000 nid=0x39d8 waiting
on condition [0x1db9f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xb9e1e6d8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

"RMI TCP Connection(1)-192.168.1.128" daemon prio=6 tid=0x0bbf7800
nid=0x111c runnable [0x1dd3e000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked <0xe70003c8> (a java.io.BufferedInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- <0xb959bc68> (a
java.util.concurrent.ThreadPoolExecutor$Worker)

"RMI TCP Accept-0" daemon prio=6 tid=0x0bbf5000 nid=0x1fe0 runnable
[0x1da4e000]
   java.lang.Thread.State: RUNNABLE
at java.net.DualStackPlainSocketImpl.accept0(Native Method)
at
java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:121)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:183)
- locked <0xb9531a78> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:522)
at java.net.ServerSocket.accept(ServerSocket.java:490)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

"DestroyJavaVM" prio=6 tid=0x0bbf6800 nid=0x60c waiting on
condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"searcherExecutor-6-thread-1" prio=6 tid=0x0bbf6000 nid=0x3480 in
Object.wait() [0x1441e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb9e6a4a0> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1379)
- locked <0xb9e6a4a0> (a java.lang.Object)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore

Re: Is there an EdgeSingleFilter already?

2013-03-17 Thread xavier jmlucjav
Steve, worked like a charm.
thanks!


On Sun, Mar 17, 2013 at 7:37 AM, Steve Rowe  wrote:

> See https://issues.apache.org/jira/browse/LUCENE-4843
>
> Let me know if it works for you.
>
> Steve
>
> On Mar 16, 2013, at 5:35 PM, xavier jmlucjav  wrote:
>
> > I read too fast your reply, so I thought you meant configuring the
> > LimitTokenPositionFilter. I see you mean I have to write one, ok...
> >
> >
> >
> > On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav  >wrote:
> >
> >> Steve,
> >>
> >> Yes, I want only "one", "one two", and "one two three", but nothing
> else.
> >> Cool if this can be achieved without java code even better, I'll check
> that
> >> filter.
> >>
> >> I need this for building a field used for suggestions, the user
> >> specifically wants no match only from the edge.
> >>
> >> thanks!
> >>
> >> On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe  wrote:
> >>
> >>> Hi xavier,
> >>>
> >>> It's not clear to me what you want.  Is the "edge" you're referring to
> >>> the beginning of a field? E.g. raw text "one two three four" with
> >>> EdgeShingleFilter configured to produce unigrams, bigrams and trigams
> would
> >>> produce "one", "one two", and "one two three", but nothing else?
> >>>
> >>> If so, I suspect writing a LimitTokenPositionFilter (which would stop
> >>> emitting tokens after the token position exceeds a specified limit)
> would
> >>> be better, rather than subclassing ShingleFilter.  You could use
> >>> LimitTokenCountFilter as a model, especially its "comsumeAllTokens"
> option.
> >>> I think this would make a nice addition to Lucene.
> >>>
> >>> Also, what do you plan to use this for?
> >>>
> >>> Steve
> >>>
> >>> On Mar 16, 2013, at 5:02 PM, xavier jmlucjav 
> wrote:
> >>>> Hi,
> >>>>
> >>>> I need to use shingles but only keep the ones that start from the
> edge.
> >>>>
> >>>> I want to confirm there is no way to get this feature without
> >>> subclassing
> >>>> ShingleFilter, cause I thought someone would have already encountered
> >>> this
> >>>> use case
> >>>>
> >>>> thanks
> >>>> xavier
> >>>
> >>>
> >>
>
>


Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
I read too fast your reply, so I thought you meant configuring the
LimitTokenPositionFilter. I see you mean I have to write one, ok...



On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav wrote:

> Steve,
>
> Yes, I want only "one", "one two", and "one two three", but nothing else.
> Cool if this can be achieved without java code even better, I'll check that
> filter.
>
> I need this for building a field used for suggestions, the user
> specifically wants no match only from the edge.
>
> thanks!
>
> On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe  wrote:
>
>> Hi xavier,
>>
>> It's not clear to me what you want.  Is the "edge" you're referring to
>> the beginning of a field? E.g. raw text "one two three four" with
>> EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
>> produce "one", "one two", and "one two three", but nothing else?
>>
>> If so, I suspect writing a LimitTokenPositionFilter (which would stop
>> emitting tokens after the token position exceeds a specified limit) would
>> be better, rather than subclassing ShingleFilter.  You could use
>> LimitTokenCountFilter as a model, especially its "comsumeAllTokens" option.
>>  I think this would make a nice addition to Lucene.
>>
>> Also, what do you plan to use this for?
>>
>> Steve
>>
>> On Mar 16, 2013, at 5:02 PM, xavier jmlucjav  wrote:
>> > Hi,
>> >
>> > I need to use shingles but only keep the ones that start from the edge.
>> >
>> > I want to confirm there is no way to get this feature without
>> subclassing
>> > ShingleFilter, cause I thought someone would have already encountered
>> this
>> > use case
>> >
>> > thanks
>> > xavier
>>
>>
>


Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
Steve,

Yes, I want only "one", "one two", and "one two three", but nothing else.
Cool if this can be achieved without java code even better, I'll check that
filter.

I need this for building a field used for suggestions, the user
specifically wants no match only from the edge.

thanks!

On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe  wrote:

> Hi xavier,
>
> It's not clear to me what you want.  Is the "edge" you're referring to the
> beginning of a field? E.g. raw text "one two three four" with
> EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
> produce "one", "one two", and "one two three", but nothing else?
>
> If so, I suspect writing a LimitTokenPositionFilter (which would stop
> emitting tokens after the token position exceeds a specified limit) would
> be better, rather than subclassing ShingleFilter.  You could use
> LimitTokenCountFilter as a model, especially its "comsumeAllTokens" option.
>  I think this would make a nice addition to Lucene.
>
> Also, what do you plan to use this for?
>
> Steve
>
> On Mar 16, 2013, at 5:02 PM, xavier jmlucjav  wrote:
> > Hi,
> >
> > I need to use shingles but only keep the ones that start from the edge.
> >
> > I want to confirm there is no way to get this feature without subclassing
> > ShingleFilter, cause I thought someone would have already encountered
> this
> > use case
> >
> > thanks
> > xavier
>
>


[ANN] vifun: a GUI to help visually tweak Solr scoring, release 0.6

2013-03-10 Thread xavier jmlucjav
Hi,

I am releasing an new version (0.6) of vifun, a GUI to help visually tweak
Solr scoring. Most relevant changes are:
- support float values
- add support for tie
- synch both Current/Baseline scrollbars (if some checkbox is selected)
- doubleclick in a doc: show side by side comparison of debug score info
- upgrade to griffon1.2.0
- allow using another handler (besides /select) enhancement

You can check it out here: https://github.com/jmlucjav/vifun
Binary distribution:
http://code.google.com/p/vifun/downloads/detail?name=vifun-0.6.zip

xavier


Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-03-04 Thread xavier jmlucjav
Hi Mark,

Thanks for trying it out.

Let me see if I explain it better: the number you have to select (in order
to later being able to tweak it with the slider), is  any number that must
be in one of the parameters in the Scoring section.

The issue you have, is that you are using /select handler from the example
distribution, and that handler does not have any of these parameters (qf,
pf, pf2, pf3, ps, ps2, ps3, bf, bq, boost, mm, tie), so it's normal they
don't show up, there is nothing to tweak...

In the example configuration from 4.1, you can select /browse handler, as
it uses qf and mm, and you should be able to tweak them. Of course If you
were using a real Solr installation with a sizable number of documents and
some complex usage of edismax, you would be able to see much better what
the tool can do.

xavier


On Mon, Mar 4, 2013 at 10:52 PM, Mark Bennett
wrote:

> Hello Xavier,
>
> Thanks for uploading this and sharing.  I also read the other messages in
> the thread.
>
> I'm able to get part way through your Getting Started section, I get
> results, but I get stuck on the editing values.  I've tried with Java 6 and
> 7, with both the 0.5 binary and from the source distribution.
>
> What's working:
> * Default Solr 4.1 install  (plus a couple extra fields in schema)
> * Able to connect to Solr (/collection1)
> * Able to select handler (/select)
> * Able to run a search:
>   q=bandwidth
>   rows=10
>   fl=title
>   rest: pt=45.15,-93.85 (per your example)
> * Get 2 search results with titles
> * Able to select a result, mouse over, highlight score, etc.
>
> However, what I'm stuck on:
> * Below the Run Query button, I only see the grayed out Scoring slider.
> * The instructions say to highlight some numbers
>   - I tried highlighting the 10 in rows paramour
>   - I also tried the 45.15 in "rest", and some of the scores in the
> results list
>
> I never see the extra parameters you show in this screen shot:
>
> https://raw.github.com/jmlucjav/vifun/master/img/screenshot-selecttarget.jpg
> I see the word "Scoring:"
> I don't see the blue text "Select a number as a target to tweak"
> I don't see the parameters qf, bf_0, 1, 2, bq_0, etc.
>
> I'm not sure how to get those extra fields to appear in the UI.
>
> I also tried adding defType=edismax, no luck
>
> The Handlers it sees:
> /select, /query, /browse, /spell, /tvrh, /clustering, /terms,
> /elevate
> (from default Solr 4.1 solrconfig.xml)
> I'm using /select
>
>
> --
> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
>
>
>
>
>
>
>
> On Feb 23, 2013, at 6:12 AM, jmlucjav  wrote:
>
> > Hi,
> >
> > I have built a small tool to help me tweak some params in Solr (typically
> > qf, bf in edismax). As maybe others find it useful, I am open sourcing it
> > on github: https://github.com/jmlucjav/vifun
> >
> > Check github for some more info and screenshots. I include part of the
> > github page below.
> > regards
> >
> > Description
> >
> > Did you ever spend lots of time trying to tweak all numbers in a
> *edismax*
> > handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine
> > you have the params below, is 20 the right boosting for *name* or is it
> too
> > much? Is *population* being boosted too much versus distance? What about
> > new documents?
> >
> >
> >name^20 textsuggest^10 edge^5 ngram^2
> phonetic^1
> >33%
> >
> >recip(geodist(),1,500,0)
> >
> >    product(log(sum(population,1)),100)
> >
> >recip(rord(moddate),1,1000,1000)
> >
> > This tool was developed in order to help me tweak the values of boosting
> > functions etc in Solr, typically when using edismax handler. If you are
> fed
> > up of: change a number a bit, restart Solr, run the same query to see how
> > documents are scored now...then this tool is for you.
> > <https://github.com/jmlucjav/vifun#features>Features
> >
> >   - Can tweak numeric values in the following params: *qf, pf, bf, bq,
> >   boost, mm* (others can be easily added) even in * or
> >   *
> >   - View side by side a Baseline query result and how it changes when you
> >   gradually change each value in the params
> >   - Colorized values, color depends on how the document does related to
> >   baseline query
> >   - Tooltips give you Explain info
> >   - Works on remote Solr installations
>

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread jmlucjav
Hi Roman,

I read with interest your thread about relevance testing a couple of weeks
ago and yes, I noticed it was related somehow. But what you were proposing
there is a different approach I think.

In my tool, you have some baseline setting (it might be good or bad), and
using a single query, you can visually see how documents rank differently
when changing parameters. But the way to see the difference is the user's
eye, and I am not using any statistical measurement to compare both
settings. So it is a bit limited.

In your approach I understand that you have some data that allows you to
measure how well queries do (like clicks etc). So I think your approach is
more useful, but probably harder to achieve. Not sure how I could merge
that into my tool.
In order to test, myself, I am working on some Geb tests (
http://www.gebish.org/testing) to check things like Steffen proposes on
your thread.

regards
xavier


On Mon, Feb 25, 2013 at 8:23 PM, Roman Chyla  wrote:

> Oh, wonderful! Thank you :) I was hacking some simple python/R scripts that
> can do a similar job for qf... the idea was to let the algorithm create
> possible combinations of params and compare that against the baseline.
>
> Would it be possible/easy to instruct the tool to harvest results for
> different combinations and export it? I would like to make plots similar to
> those:
>
>
> https://github.com/romanchyla/r-ranking-fun/blob/master/plots/raw/test-plot-showing-factors.pdf?raw=true
>
> roman
>
> On Sat, Feb 23, 2013 at 9:12 AM, jmlucjav  wrote:
>
> > Hi,
> >
> > I have built a small tool to help me tweak some params in Solr (typically
> > qf, bf in edismax). As maybe others find it useful, I am open sourcing it
> > on github: https://github.com/jmlucjav/vifun
> >
> > Check github for some more info and screenshots. I include part of the
> > github page below.
> > regards
> >
> > Description
> >
> > Did you ever spend lots of time trying to tweak all numbers in a
> *edismax*
> >  handler *qf*, *bf*, etc params so docs get scored to your liking?
> Imagine
> > you have the params below, is 20 the right boosting for *name* or is it
> too
> > much? Is *population* being boosted too much versus distance? What about
> > new documents?
> >
> > 
> > name^20 textsuggest^10 edge^5 ngram^2
> > phonetic^1
> > 33%
> > 
> > recip(geodist(),1,500,0)
> > 
> > product(log(sum(population,1)),100)
> > 
> > recip(rord(moddate),1,1000,1000)
> >
> > This tool was developed in order to help me tweak the values of boosting
> > functions etc in Solr, typically when using edismax handler. If you are
> fed
> > up of: change a number a bit, restart Solr, run the same query to see how
> > documents are scored now...then this tool is for you.
> >  <https://github.com/jmlucjav/vifun#features>Features
> >
> >- Can tweak numeric values in the following params: *qf, pf, bf, bq,
> >boost, mm* (others can be easily added) even in * or
> >*
> >- View side by side a Baseline query result and how it changes when
> you
> >gradually change each value in the params
> >- Colorized values, color depends on how the document does related to
> >baseline query
> >- Tooltips give you Explain info
> >- Works on remote Solr installations
> >- Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as
> >long as wt=javabin format is compatible)
> >    - Developed using Groovy/Griffon
> >
> >  <https://github.com/jmlucjav/vifun#requirements>Requirements
> >
> >- */select* handler should be available, and not have any *
> or
> >*, as it could interfere with how vifun works.
> >- Java6 is needed (maybe it runs on Java5 too). A JRE should be
> enough.
> >
> >  <https://github.com/jmlucjav/vifun#getting-started>Getting started
> > <
> >
> https://github.com/jmlucjav/vifun#click-here-to-download-latest-version-and-unzip
> > >Click
> > here to download latest
> > version<
> http://code.google.com/p/vifun/downloads/detail?name=vifun-0.4.zip
> > >
> > and
> > unzip
> >
> >- Run vifun-0.4\bin\vifun.bat or vifun-04\bin\vifun if on linux/OSX
> >- Edit *Solr URL* to match yours (in Sol4.1 default is
> >http://localhost:8983/solr/collection1 for example) [image: hander
> >selection]<
> > https://github.com/jmlucjav/vifun/raw/master/img/screenshot-handlers.jpg
> >
> >- *Show Handerls*, and select the handler you wish 

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread jmlucjav
Jan, thanks for looking at this!

- Running from source: would you care to send me the error you get (if any)
when running from source? I assume you have griffon1.1.0 installed right?

- Binary dist: the distrib is created by griffon, so I'll check if the
permission issue (I develop on windows, and tested on a clean windows too,
so I don't face the issue you mention) is known or can be fixed somehow.
I'll update the doc anyway.

- wt param: I am already overriding wt param (in order to use javabin).
What I didn't allow is to choose the handler to be used when submitting the
query. I guess any handler that does not have / that
would interfere would work fine, I just thought /select is mostly available
in most installations and that is one thing less to configure. But yes, I
could let the user configure it, I'll open an issue.

xavier

On Mon, Feb 25, 2013 at 3:10 PM, Jan Høydahl  wrote:

> Cool. I tried running from source (using the bundled griffonw), but I
> think the instructions may be wrong, had to download binary dist.
> The file permissions for bin/vifun in binary dist should have +w so you
> can execute it with ./vifun
>
> What about the ability to override the "wt" param, so that you can point
> it to the "/browse" handler directly?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 23. feb. 2013 kl. 15:12 skrev jmlucjav :
>
> > Hi,
> >
> > I have built a small tool to help me tweak some params in Solr (typically
> > qf, bf in edismax). As maybe others find it useful, I am open sourcing it
> > on github: https://github.com/jmlucjav/vifun
> >
> > Check github for some more info and screenshots. I include part of the
> > github page below.
> > regards
> >
> > Description
> >
> > Did you ever spend lots of time trying to tweak all numbers in a
> *edismax*
> > handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine
> > you have the params below, is 20 the right boosting for *name* or is it
> too
> > much? Is *population* being boosted too much versus distance? What about
> > new documents?
> >
> >
> >name^20 textsuggest^10 edge^5 ngram^2
> phonetic^1
> >33%
> >
> >recip(geodist(),1,500,0)
> >
> >product(log(sum(population,1)),100)
> >
> >recip(rord(moddate),1,1000,1000)
> >
> > This tool was developed in order to help me tweak the values of boosting
> > functions etc in Solr, typically when using edismax handler. If you are
> fed
> > up of: change a number a bit, restart Solr, run the same query to see how
> > documents are scored now...then this tool is for you.
> > <https://github.com/jmlucjav/vifun#features>Features
> >
> >   - Can tweak numeric values in the following params: *qf, pf, bf, bq,
> >   boost, mm* (others can be easily added) even in * or
> >   *
> >   - View side by side a Baseline query result and how it changes when you
> >   gradually change each value in the params
> >   - Colorized values, color depends on how the document does related to
> >   baseline query
> >   - Tooltips give you Explain info
> >   - Works on remote Solr installations
> >   - Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as
> >   long as wt=javabin format is compatible)
> >   - Developed using Groovy/Griffon
> >
> > <https://github.com/jmlucjav/vifun#requirements>Requirements
> >
> >   - */select* handler should be available, and not have any * or
> >   *, as it could interfere with how vifun works.
> >   - Java6 is needed (maybe it runs on Java5 too). A JRE should be enough.
> >
> > <https://github.com/jmlucjav/vifun#getting-started>Getting started
> > <
> https://github.com/jmlucjav/vifun#click-here-to-download-latest-version-and-unzip
> >Click
> > here to download latest
> > version<
> http://code.google.com/p/vifun/downloads/detail?name=vifun-0.4.zip>
> > and
> > unzip
> >
> >   - Run vifun-0.4\bin\vifun.bat or vifun-04\bin\vifun if on linux/OSX
> >   - Edit *Solr URL* to match yours (in Sol4.1 default is
> >   http://localhost:8983/solr/collection1 for example) [image: hander
> >   selection]<
> https://github.com/jmlucjav/vifun/raw/master/img/screenshot-handlers.jpg>
> >   - *Show Handerls*, and select the handler you wish to tweak from *
> >   Handerls* dropdown. The text area below shows the parameters of the
> >   handler.
> >   - Modify the values to run a baseline qu

rogue values in schema browser histogram

2012-12-28 Thread jmlucjav
Hi,

I have an index where schema browser histogram reports some terms that I
never indexed. When you run a query to get those terms you get of course
none. I optimized the index and same issue. The field is a TrieIntField.

I think I might have seen some post about this (or a similar) issue but did
not find any in jira, anyone can direct me to some ticket?

I am in solr4.0
regards




--
View this message in context: 
http://lucene.472066.n3.nabble.com/rogue-values-in-schema-browser-histogram-tp4029510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-15 Thread jmlucjav
without going through such rigorous testing, maybe for my case (interested
only in DAY), I could just index the trielong values such as 20121010,
20110101 etc... 

This would take less space than trieDate (I guess), and I still have a date
looking number (for easier handling). I could even base the days on
2000/01/01 and just index a single int (1..365, 366,...), but I don't think
it's worth for now, I prefer to keep an easier to understand number.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027193.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-14 Thread jmlucjav
thanks Lance. 

I new about rounding in the request params, but I want to know if there is
something to tweak at indexing time (by changing precisionSteop in
schema.xml) in order to store only needed information. At query time yes, I
would round to /DAY



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027120.html
Sent from the Solr - User mailing list archive at Nabble.com.


optimun precisionStep for DAY granularity in a TrieDateField

2012-12-14 Thread jmlucjav
Hi

I have a TrieDateField in my index, where I will index dates (range
2000-2020). I am only interested in the DAY granularity, that is , I dont
care about time (I'll index all based on the same Timezone).

Is there an optimun value for precisionStep that I can use so I don't index
info I will not ever use?? I have looked but have not found some info on
what values of precisionStep map to year/month/../day/hour... (not sure if
the mapping is straightforward anyway).

thanks for the help.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread jmlucjav
If you are using DIH, is just doing (for a mysql project I have around for
example) something like this:

 CONCAT(lat, ',',lon) as latlon





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-schema-xml-and-data-config-xml-tp4020376p4020437.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with Velocity in SolrItas

2012-10-09 Thread jmlucjav
Paul Libbrecht-4 wrote
> PS: to stop this hell, I have a JSP pendant to the VelocityResponseWriter,
> is this something of interest for someone so that I contribute it?

Paul...yes it is! Anything that would help velocity related issues is
welcome



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-Velocity-in-SolrItas-tp4012636p4012668.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How To apply transformation in DIH for multivalued numeric field?

2012-07-19 Thread jmlucjav
I have seen that issue several times, in my case it was always with an id
field, mysql db and linux. Same config but on windows did not show that
issue. 

Never got to the bottom of it...as it was an id it was just working as it
was unique. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-To-apply-transformation-in-DIH-for-multivalued-numeric-field-tp3995810p3995927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread jmlucjav
there is at least one scenario where no error is reported when it should be,
if the host runs out of disk when optimizing, it is not reported.

There is a jira issue open I think

--
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110p3987144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: adding an OR to a fq makes some doc that matched not match anymore

2012-05-15 Thread jmlucjav
oh yeah, forgot about negatives and *:*...
thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-an-OR-to-a-fq-makes-some-doc-that-matched-not-match-anymore-tp3983775p3983863.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: adding an OR to a fq makes some doc that matched not match anymore

2012-05-15 Thread jmlucjav
that does not change the results for me:

-suggest?q=suggest_terms:lap*&fq=type:P&fq=((-type:B))&debugQuery=true
-found 1

-suggest?q=suggest_terms:lap*&fq=type:P&fq=((-type:B)+OR+name:aa)&debugQuery=true
-found 0

looks like a bug?
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-an-OR-to-a-fq-makes-some-doc-that-matched-not-match-anymore-tp3983775p3983828.html
Sent from the Solr - User mailing list archive at Nabble.com.


adding an OR to a fq makes some doc that matched not match anymore

2012-05-14 Thread jmlucjav
Hi, 

I am trying to understand this scenario (Solr3.6):
- /suggest?q=suggest_terms:lap*&fq=type:P&fq=(-type:B)
numFound=1

- I add a OR to the second fq. That fq is already fulfilled by the found
doc, so adding a doc will also fulfill right?
/suggest?q=suggest_terms:lap*&fq=type:P&fq=(-type:B OR name:aa)
numFound=0

is there a logical explanation??
thanks 
xab


--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-an-OR-to-a-fq-makes-some-doc-that-matched-not-match-anymore-tp3983775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java 1.6 requirement not documented clearly?

2012-04-23 Thread jmlucjav
oh, then it should work with 1.5?? OK i know what happened then. I did not
see it happening myself, but he unzipped 3.6, started solr with the example
config and got the error. He had java1.5, so I told him to upgrade and it
worked, so I assumed Solr required 1.6

But this was in a linux box, so most probably java1.5 it was using was
GCJ...

thanks
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933920.html
Sent from the Solr - User mailing list archive at Nabble.com.


java 1.6 requirement not documented clearly?

2012-04-23 Thread jmlucjav
Both wiki http://wiki.apache.org/solr/SolrInstall and tutorial
http://lucene.apache.org/solr/api/doc-files/tutorial.html state java 1.5 is
required, but trying to run solr3.6 with java 1.5 was giving some cryptic
error to a colleague.

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933799.html
Sent from the Solr - User mailing list archive at Nabble.com.


property substitution not working with multicore

2012-04-18 Thread jmlucjav
Hi,

I cannot seem to get right the configuration of using a properties file for
cores (with 3.6.0). In Solr3 Entr. Search Server book they say this:
"This property substitution works in solr.xml ,  solrconfig.xml, 
schema.xml, and DIH configuration files."

So my solr.xml is like this:

  

  

core0.properties is in multicore/core0 (I tried with an absolute path too
but does not work either)

And my properties file has:
config.datadir=c:\\tmp\\core0\\data
config.db-data.jdbcUrl=jdbc:mysql:localhost\\...
config.db-data.username=root
config.db-data.password=

None of those values are taken into account. I think I read in jira that dih
does not support properties, but as they say in the book it does I just
tried. The path to data dir should work right? But not even taht one, I
always get the index in ./tmp/solr_data

any hints?
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/property-substitution-not-working-with-multicore-tp3919696p3919696.html
Sent from the Solr - User mailing list archive at Nabble.com.


a way to transform doc and send it to a second core?

2012-04-16 Thread jmlucjav
Is there some way to index docs (extracted from main documenet) in a second
core when Solr is indexing the main document in a first core?

I guess it can be done by an UpdateProcessor in /core0 that prepares the new
docs and just calls /core1/update but maybe someone has already done this in
a better way/predefined hook?

The reason behind this is that I want to split a multivalued field into a
non-multivalued filed, each value being a diff document.

thanks
xab 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-way-to-transform-doc-and-send-it-to-a-second-core-tp3915859p3915859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting and Variable Buckets

2012-04-16 Thread jmlucjav
have a look at
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-and-Variable-Buckets-tp3913947p3914017.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggester not working for digit starting terms

2012-04-12 Thread jmlucjav
Well now I am really lost...

1. yes I want to suggest whole sentences too, I want the tokenizer to be
taken into account, and apparently it is working for me in 3.5.0?? I get
suggestions that are like "foo bar abc".  Maybe what you mention is only for
file based dictionaries? I am using the field itself.

2. but for the digit issue, in that case nothing is suggested, not even the
term 500 that is there cause I can find it with this query
http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500 

I tried to set threshold to 0 in case the term was being removed, and is not
that.

Moving to 3.6.0 is not a problem (I had already downloaded the rc actually)
but I still see weird things here.

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3906303.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggester not working for digit starting terms

2012-04-11 Thread jmlucjav
Just to be sure, reproduced this with example config from 3.5.

1. add to schema.xml











2 1. add to solrconfig.xml


a_suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookup
a_suggest

true
100




true
a_suggest
true
5
true


suggest


3. wipe data and undex sample docs
4. 
http://localhost:8983/solr/suggest?&q=720&debugQuery=true   --- 0 result
http://localhost:8983/solr/select/?q={!prefix%20f=a_suggest}720 ---
1 result


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3903790.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: using solr to do a 'match'

2012-04-11 Thread jmlucjav
I have done that by getting X top hits, finding the best match among them
(combination of Levenshtein distance, contains...tweaked the code till
testing showed good results), and then deciding if the candidate was a match
or not, again based in custom code plus a user defined leniency value

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-solr-to-do-a-match-tp3901436p3901884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggester not working for digit starting terms

2012-04-10 Thread jmlucjav
I have double checked and still get the same behaviour. My field is:









Analisys shows numbers are there, for '500 $' I get as last step both in
index&query:

org.apache.solr.analysis.TrimFilterFactory {luceneMatchVersion=LUCENE_35}
position1
term text   500 $
startOffset 0
endOffset   5

So I still see something going wrong here
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3900783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Suggester not working for digit starting terms

2012-04-07 Thread jmlucjav
Hi,

I am using Suggester component, as advised in Solr3 book (using solr3.5):


a_suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookup
a_suggest
true
100




true
a_suggest
true
5
true


suggest



But, even if it works fine with words, it seems it does not work for terms
starting with diggits. For example:
http://localhost:8983/solr/suggest?&q=500
gets 0 results, but I know '500 $' is in the a_suggest field, as I can find
many hits by:
http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500

Am I missing something? I have been trying to play with
spellcheck.onlyMorePopular and spellcheck.accuracy but I get the same
results.

thansk
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3893433.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread jmlucjav
depending on you jvm version, -XX:+UseCompressedStrings would help alleviate
the problem. It did help me before.

xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885493.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: space making it hard tu use wilcard with lucene parser

2012-04-04 Thread jmlucjav
thanks, that will work I think

--
View this message in context: 
http://lucene.472066.n3.nabble.com/space-making-it-hard-tu-use-wilcard-with-lucene-parser-tp3882534p3885460.html
Sent from the Solr - User mailing list archive at Nabble.com.


space making it hard tu use wilcard with lucene parser

2012-04-03 Thread jmlucjav
Hi,

I have a field type simpletext:

  



  

I have such a field name="fsuggest" type="simpletext" 

I index there a value like this (bewtween []): [wi/rac/house aa bbb] 
I can see in analysys page it is indexed as [wi/rac/house aa bbb] 

I have a handler: 
  
 
   lucene
explicit
 

  
I query: 
http://localhost:8983/solr/select/?qt=/suggest&q=fsuggest:wi/rac/ho*  -- I
get the doc
http://localhost:8983/solr/select/?qt=/suggest&q=fsuggest:wi/rac/house a* --
I don't get that doc, and I get other docs that have fsuggest:abc

Is there a way to query for: fsuggest starting with 'wi/rac/house a' ??
I am in Solr3.5

I could not find a way
thanks
xab


--
View this message in context: 
http://lucene.472066.n3.nabble.com/space-making-it-hard-tu-use-wilcard-with-lucene-parser-tp3882534p3882534.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solritas 'timestamp' parameter in call to /terms

2012-03-13 Thread jmlucjav
I suspected it was to avoid caching, but I thought what was the harm of
caching at http level taking place if it's just suggestions, I would say it
would be even better.

So I can remove it...

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solritas-timestamp-parameter-in-call-to-terms-tp3822734p3823244.html
Sent from the Solr - User mailing list archive at Nabble.com.


solritas 'timestamp' parameter in call to /terms

2012-03-13 Thread jmlucjav
Hi,

I am studying solristas with its browse UI that comes in 3.5.0 example. I
have noticed the calls to /terms in order to get autocompletion terms have a
'timestamp' parameter.

What is it for? I did not find any such param in solr docs.

Can be safely be removed? 

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solritas-timestamp-parameter-in-call-to-terms-tp3822734p3822734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Geonames/spatial stuff usable in 3.5?

2012-03-04 Thread jmlucjav
Hi,

I was looking for a way to use spatical search given a location name (like
'dallas,tx'), and also given an IP, and I found 
http://lucene.472066.n3.nabble.com/Spatial-Geonames-and-extension-to-Spatial-Solution-for-Solr-tc1311813.html
this post by Chris Mattmann  mentioning some work with Geonames that seems
to do all I need. The work is done in 8 different issues, and was implmented
for 3.2

I have the following questions:

Has anyone used this and do you have some feedback about it?

Is the IP-to-lat/lon ready to run out of the box or it needs some external
service?

If I wanted to use this in 3.5 the easiest way would be to apply all patches
to current 3.5? Would they apply cleanly or probably not?

Are there any plans to incorporate this to Solr itself or it will remain as
some patches?

thanks,
javi


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Geonames-spatial-stuff-usable-in-3-5-tp3798866p3798866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: lucene operators interfearing in edismax

2012-02-21 Thread jmlucjav
Ok thanks.

But I reviewed some of my searches and the - was not surrounded by
withespaces in all cases, so I'll have to remove lucene operators myself
from the user input. I understand there is no predefined way to do so.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3763324.html
Sent from the Solr - User mailing list archive at Nabble.com.


lucene operators interfearing in edismax

2012-02-20 Thread jmlucjav
Hi,

I am using edismax with end user entered strings. One search was not finding
what appeared to be the best match. The search was:

Sage Creek Organics - Enchanted

If I remove the -, the doc I want is found as best score. Turns out (I
think) the - is the culprit as the best match has 'enchanted' and this makes
it 'NOT enchanted'

Is my analisys correct? I tried looking at the debug output but saw not NOT
entries there...

If so, is there a standard way (any filter) to remove lucene operators from
user entered queries? I thought this must be something usual.

thanks
javi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav
yes, I am using https://github.com/alexwinston/RunJettyRun that apparently is
a fork of the original project that originated in the need to use an
jetty.xml.

So I am already setting an additional jetty.xml, this can be done in the Run
configuration, no need to use -D param. But as I mentioned solr does not
start cleanly if I do that.

So I wanted to understand what role plays /etc/jetty.xml 
- when solr is started via 'java -jar start.jar'
- when started with RunJettyRun in eclipse.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/usage-of-etc-jetty-xml-when-debugging-Solr-in-Eclipse-tp3725588p3728008.html
Sent from the Solr - User mailing list archive at Nabble.com.


usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav
Hi,

I am following
http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse
in order to be able to debug Solr in eclipse. I got it working fine.

Now, I usually use ./etc/jetty.xml to set logging configuration. When
starting jetty in eclipse I dont see any log files created, so I guessed
jetty.xml is not being used. So I added it to RunJetty Advanced
configuration (Additional jetty.xml), but in that case something goes wrong,
as I get a 'java.net.BindException: Address already in use: JVM_Bind' error,
like if something is started twice.

So my question is: can jetty.xml be used while debugging in eclipse? If so,
how? I would like to use the same configuration I use when I am just
changing xml stuff in Solr and starting with 'java -jar start.jar'.

thank in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/usage-of-etc-jetty-xml-when-debugging-Solr-in-Eclipse-tp3725588p3725588.html
Sent from the Solr - User mailing list archive at Nabble.com.