solrcloud without faceting, i.e. for failover only

2015-01-06 Thread Will Milspec
Hi all,

We have a smallish index that performs well for searches and are
considering using solrcloud --but just for high availability/redundancy,
i.e. without any sharding.

The indexes would be replicated, but not distributed.

I know that "there are no stupid questions..Only stupid people"...but here
goes:

-is solrcloud w/o sharding done?( I.e. "it's just not done!!" )
-any downside (i.e. aside from the lack of horizontal scalability )

will


Re: null pointer on FSTCompletionLookup

2014-06-05 Thread Will Milspec
Update: this was a configuration error.

In my haste/carelessness, instead of defining separate "spellcheck" and
"suggest" components, I defined only "suggest".   (More specifically I
copied over the ch10 examples from "solr in action", but did not copy the
"spellcheck" component." )

When solr complained about not finding 'spellcheck' component, I looked
over my (bad) solrconfig.xml and thought "hmm. 'spellcheck..component
probably should be 'suggest'". It worked after re-indexing..and appeared to
function correctly.

Cracking open the hard copy, sitting down in the easy chair, looking
carefully over the chapter brought the issue to my attention.

thanks for your patience...


Re: null pointer on FSTCompletionLookup

2014-06-04 Thread Will Milspec
Hi all,

I know this probably seems like an uninteresting problem and smells, even
to me,  like a stupid/newbie mis-configuration [Yes. I am reading the
excellent solr in action and  trying my hand at applying the "suggestion
examples"], but I looked a bit into this tonight, fired up the debugger,
stepped through code, etc to try to find where I erred:  to no avail.

Some questions:

First, does the SpellCheck component's "FSTLookupFactory" require any extra
special configuration, e.g. term vectors for the field ("suggest" below),
etc.:
org.apache.solr.spelling.suggest.fst.FSTLookupFactory
suggest

Second, why does the FSTCompletionLookup not check for nulls here for these
variables: higherWeightsCompletion and normalCompletion?  Wo

if (higherWeightsFirst) {
  completions = higherWeightsCompletion.lookup(key, num);
} else {
  completions = normalCompletion.lookup(key, num);
}

[Stepping through the code, I saw it execute this constructor:

  /**
   * This constructor prepares for creating a suggested FST using the
   * {@link #build(TermFreqIterator)} method.
   *
   * @param buckets
   *  The number of weight discretization buckets (see
   *  {@link FSTCompletion} for details).
   *
   * @param exactMatchFirst
   *  If true exact matches are promoted to the top of
the
   *  suggestions list. Otherwise they appear in the order of
   *  discretized weight and alphabetical within the bucket.
   */
  public FSTCompletionLookup(int buckets, boolean exactMatchFirst) {

This constructor never initializes the  two *Completion variables ]


Third: I got inconsistent results. If I started solr afresh: this error
appeared. If I reindexed my test site, then executed my 'problematic
searches' , the problem went away. Why would this happen

Thanks in advance





On Wed, Jun 4, 2014 at 9:32 AM, Will Milspec  wrote:

> Hi all,
>
> Someone posted this problem over a year ago but I did not see a clear
> resolution in the thread.
>
> Intermittently--i.e. for some searches, not others--the
> 'suggest/spellcheck' component throws a n NullPointerException (NPE) when a
> user executes  a search. It fails on  FSTCompletionLookup (line 244)
>
> I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I
> could upgrade if necessary. )
>
> Any hints on why it occurs and how to fix? The earlier post alluded to
> "changing the field type solved the problem", but did not provide details.
>
> Thanks
>
> will
>
> /select request handler:
> 
>
>on
>   suggestDictionary
>   false
>   5
>   2
>   5
>   true
>   true
>   5
>   3
>
> spellcheck component:
> 
>
> 
> 
> suggestDictionary
>  name="classname">org.apache.solr.spelling.suggest.Suggester
>  name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory
> title
>  
> 0.
> true
> 
> 
>
> field type definition:
> 
>
>  positionIncrementGap="100">
>   
> 
> 
>  words="stopwords.txt" />
> 
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>
> field definition:
> 
>
>  multiValued="false" omitNorms="false"/>
>
> It fails here:
> ===
> Here's the line that fails.
>
> @Override
>   public List lookup(CharSequence key, boolean
> higherWeightsFirst, int num) {
> final List completions;
> if (higherWeightsFirst) {
>   completions = higherWeightsCompletion.lookup(key, num);
> } else {
>   completions = normalCompletion.lookup(key, num); <-- fails on this
> line
>
> }
>
>


null pointer on FSTCompletionLookup

2014-06-04 Thread Will Milspec
Hi all,

Someone posted this problem over a year ago but I did not see a clear
resolution in the thread.

Intermittently--i.e. for some searches, not others--the
'suggest/spellcheck' component throws a n NullPointerException (NPE) when a
user executes  a search. It fails on  FSTCompletionLookup (line 244)

I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I could
upgrade if necessary. )

Any hints on why it occurs and how to fix? The earlier post alluded to
"changing the field type solved the problem", but did not provide details.

Thanks

will

/select request handler:


   on
  suggestDictionary
  false
  5
  2
  5
  true
  true
  5
  3

spellcheck component:




suggestDictionary
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookupFactory
title
 
0.
true



field type definition:



  





  
  




  


field definition:




It fails here:
===
Here's the line that fails.

@Override
  public List lookup(CharSequence key, boolean
higherWeightsFirst, int num) {
final List completions;
if (higherWeightsFirst) {
  completions = higherWeightsCompletion.lookup(key, num);
} else {
  completions = normalCompletion.lookup(key, num); <-- fails on this
line

}


solr multi-tenant: anyone use per-tenant synonyms file?

2014-06-02 Thread Will Milspec
Hi all,

I've been reading up on solr cloud (via solr in action) with an eye toward
multi-tenancy. (Read: "solrcloud newbie")

One question that came up: what if a "one size fits all" synonyms file does
not work for all  customers?

 i.e. different customers/industries use different sets of synonyms.

Example
- "bond=loan" for banking
- "bond=adhere" for manufacturing

In "non-cloud" solr we would have use solr cores with identical schemas,
but different 'synonyms.txt' files.

thanks

will


localizing 'display names' for facet valus

2014-05-30 Thread Will Milspec
Hi all,

What's the cleanest way to solve this problem:  localize the 'display
names' for facet values without storing the localized names in solr.

Example:
 -store 'country code' field in solr document
 -facet on country code
 -translate the country code based on the user's locale

For the facets, the English user would see:

   England   10
   France  20
   United States  5

and the French user would see:

   Angleterre10
   France 20
   Etats Unis5

Reading through Solr in Action, I don't see that solr has any 'native' tool
to 'decode facet names' I see that the 'key', will decode the facet name,
but not the actual value.

Additionally: we are interested in using AjaxSolr in the medium term
future. Between the library and/or javascript, does ajax-solr offer
additional techniques?

I wonder if anyone could recommend a clean solution.

thanks in advance,

will


overhead of empty, unused fields

2011-08-18 Thread Will Milspec
hi all,

What are the cost of unused field types?

Our application supports multiple languages. We envision separate
Lucene/Solr fields (and field types) per language (conten_en, content_fr,
content_zh_CN,etc).

We thought of a few optons:
a) auto-generating the 'multilingual' portion of the schema based on the
application's languages,
b) include fields-and-types for all languagues


In A, if an implemenation only used French and Chinese, the schema  would
only have content_en and conten_zh_CN fields-and-types.

In B, the implementation would have all field types, but a give document
would only have two fields

A seems "more efficiient", but less work.  The downside: if a user wants to
add a language, they would need to regenerate the schema (i.e. add
fields-and-types for "ja")


How much do empty field types and fields? Do a dozen-or-so unused field
types hurt scalability of indexing or search?

thanks,

will


Synonym and Whitespaces and optional TokenizerFactory

2011-08-17 Thread Will Milspec
Hi all,

This may be obvious. My question pertains to use of tokenizerFactory
together with SynonymFilterFactory. Which tokenizerFactory does one  use to
treat "synonyms with spaces" as one token,

Example these two entries are synonyms: "lms", "learning management system"

index time expansion would expand "lms" to these terms
   "lms"
   "learning management system"

i.e. not  like this:
   "lms"
   "learning"
   "management"
   "system"

Excerpt from the wiki article:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

The optional *tokenizerFactory* parameter names a tokenizer factory class to
analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319), which
can help with the synonym+stemming problem described in
http://search-lucene.com/m/hg9ri2mDvGk1 .


thanks,

will


how to build lucene-solr (espeically if behind a firewall)?

2011-07-12 Thread Will Milspec
hi all,

building lucene/solr behind the firewall fails for us due to proxy errors.

I tried setting the ant_opts -Dhttp.proxyHost, etc, but found the "lucene"
portion still failed on javadoc links.

I worked round this by changing failonjavadocerror to 'false' in
lucene/common-build.xml (or alternatively adding -J-Dhttp.proxyHost, etc as
"args" element to the
javadoc tasks), but then 'changes2html' failed to connect to
https://issues.apache.org.

I'm posting to the solr-user group (even though compiling is developer-ish
stuff) as we need to apply a few patches lucene-solr.

Would someone be so kind as to post the following?
* Easiest way to build lucene-solr from source
* same, but if you're behind the firewall.

thanks,

will


Any chance of getting SOLR-949 into the application

2011-07-07 Thread Will Milspec
hi all,

Our applications requires term vectors and uses SOLR-949 solrj patch to
simplify the client layer. This patch eliminates the need to manually parse
the xml returned by the tvrh (term vector response handler)
   https://issues.apache.org/jira/browse/SOLR-949

Can we get this in the head/trunk?

Re-patching after each solr upgrade is a bit error prone.

thanks

will


Git tag for 3.1 release?

2011-04-18 Thread Will Milspec
Hi all,

Does the lucene-solr git repository have a tag that marks the 3.1 release?

Context:  I want to apply a patch to 3.1 and wish to start from a
well-defined point (i.e. official 3.1 release)

Executing these commands, I would have expected to see a tag marking the 3.1
release.  I only see "before_flex_merge", however.

$git checkout lucene_solr_3-1
Checking out files: 100% (3831/3831), done.
Switched to branch 'lucene_solr_3_1'

$git tag
before_flex_merge

thanks

will


SOLR-236 (Field Collapsing) patch and 3.1

2011-04-08 Thread Will Milspec
Hi all,

We're using the solr-236 (field collapsing) patch on solr 1.4.1 and wish to
upgrade to 3.1

Has anyone applied this patch to 3.1, successfully or unsuccessfully?

[ftr, Solr 4.x includes field collapsing; 3.1 does not ]

The issue has several patch files, including some for 1.4.1 specifically. I
don't see one for 3.1 specifically.

I can go ahead and apply it, but wanted to check for any "know 3.1 issues"

jira:
https://issues.apache.org/jira/browse/SOLR-236

thanks,

will


Anyone seen measurable performance improvement using Apache Portable Runtime (APR) with Solr and Tomcat

2011-01-12 Thread Will Milspec
Hi all,

Has anyone seen used Apache Portable Runtime (APR) in conjunction with  Solr
and Tomcat? Has anyone seen (or better, measured) performance improvements
when using APR?

APR is a library that implements some functionality using Native C  (see
http://apr.apache.org/ and
http://en.wikipedia.org/wiki/Apache_Portable_Runtime)

>From wikipedia entry:

The range of platform-independent functionality provided by APR includes:
* Memory allocation and memory pool functionality
* Atomic operations
* Dynamic library handling
* File I/O
* Command argument parsing
* Locking
* Hash tables and arrays
* Mmap functionality
* Network sockets and protocols
* Thread, process and mutex functionality
* Shared memory functionality
* Time routines
* User and group ID services


I could imagine benefits in file IO  as network IO. But that's pure
conjecture.

Comments?

thanks in advance


Re: Where does admin UI visually distinguish between "master" and "slave"?

2011-01-12 Thread Will Milspec
Hi all,

Thanks for the feedback. I've checked the code with a few different inputs
and believe I have found a bug.

Could someone comment as to whether I'm missing something? I will file go
ahead and file it if someone can attest "looks like a bug".

Bug Summary:
==
- Admin UI replication/index.jsp checks for master or slave with the
following code:
   if ("true".equals(detailsMap.get("isSlave")))
-  if slave, replication/index.jsp displays the "Master" and "Poll
Intervals", etc. sections (everything up to "Cores")
- if false, replication/index.jsp does not display the "Master", "Poll
Intervals" section
-This "slave check/UI difference" works correctly if the solrconfig.xml has
a  "slave" but not "master" section or vice versa

Expected results:
==
Same UI difference would occur in the following scenario:
   a) solrconfig.xml has both master and slave entries
   b) use java.properties (-Dsolr.enable.master -Dsolr.enable.slave) to set
"master" or "slave" at runtime

*OR*
c) use solrcore.properties  to set "master" and "slave" at runtime

Actual results:
==
If solrconfig.xml has both master and slave entries, replication/index.jsp
shows both "master" and "slave" section regardless of system.properties

On Wed, Jan 12, 2011 at 10:35 AM, Markus Jelsma
wrote:

> Well, slaves to show different things in the replication.jsp page.
>
> Master  http://10cc:8080/solr/replication
> Poll Interval   00:00:10
> Local Index Index Version: 1294666552434, Generation: 2515
>Location: /var/lib/solr/data/index
>Size: 4.65 GB
>Times Replicated Since Startup: 934
>
> Where master nodes (or slaves where enabled=false) show:
>
> Local Index Index Version: 1294666552449, Generation: 2530
>Location: /var/lib/solr/data/index
>Size: 4.65 GB
>
> On Wednesday 12 January 2011 17:24:57 Otis Gospodnetic wrote:
> > Hi Will,
> >
> > I don't think we have a clean "master" or "slave" label anywhere in the
> > Admin UI.
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original Message 
> >
> > > From: Will Milspec 
> > > To: solr-user@lucene.apache.org
> > > Sent: Wed, January 12, 2011 11:18:17 AM
> > > Subject: Where does admin UI visually distinguish between "master" and
> >
> > "slave"?
> >
> > > Hi all,
> > >
> > > I'm getting started with a master/slave configuration for two  solr
> > > instances.  Two distinguish between 'master' and 'slave', I've set  he
> > > system properties (e.g. "-Dmaster.enabled") and using the same
> > > 'solrconfig.xml'.
> > >
> > > I can see via the system properties admin UI that the  jvm (and thus
> > > solr) sees correct values, i.e.:
> > > enable.master =  false
> > > enable.slave = true
> > >
> > > However, the replication admin UI is  identical for both 'master' and
> > > 'slave'. (i.e.
> > > http://localhost:8983/solr/production/admin/replication/index.jsp)
> > >
> > > I'd  like a clearer visual confirmation that the master node is indeed
> a
> > > master  and the slave is a slave.
> > >
> > > Summary question:
> > > Does the admin UI   distinguish betwen "master and slave"?
> > >
> > > thanks
> > >
> > > will
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>


Where does admin UI visually distinguish between "master" and "slave"?

2011-01-12 Thread Will Milspec
Hi all,

I'm getting started with a master/slave configuration for two solr
instances.  Two distinguish between 'master' and 'slave', I've set he system
properties (e.g. "-Dmaster.enabled") and using the same 'solrconfig.xml'.

I can see via the system properties admin UI that the jvm (and thus solr)
sees correct values, i.e.:
enable.master = false
enable.slave = true

However, the replication admin UI is identical for both 'master' and
'slave'. (i.e.
http://localhost:8983/solr/production/admin/replication/index.jsp)

I'd like a clearer visual confirmation that the master node is indeed a
master and the slave is a slave.

Summary question:
Does the admin UI  distinguish betwen "master and slave"?

thanks

will


Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?

2010-12-10 Thread Will Milspec
hi all,

We wish to implement date faceting  with a 'sliding date range',   'last 24
hours, last week, last month, last year' . Google New currently implements
such faceting when you search for a topic.

As Solr's standard date faceting does not appear to meet this need, we will
need to use faceting on arbitrary queries, i.e. by passing multiple values
for facet.query

The question:
Any tips or suggestions for ensuring this performs well?

thanks,

will


How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?

2010-12-08 Thread Will Milspec
Hi all,

Pardon if this isn't the best place to post this email...maybe it belongs on
the lucene-user list .  Also, it's basically windows-specific,so not of use
to everyone...

The question: does NTFS fragmentation affect  search performance "a little
bit" or "a lot"? It's obvious that "fragmentation will slow things down",
but is it a factor of .1, 10 , or 100? (i.e what order of magnitude)?

As a follow up: should solr/lucene users periodically remind Windows
sysadmins to defrag their drives ?

On a production system, I ran the windows defrag "analyzer" and found heavy
fragmentation on the lucene index.

11,839  492 MB  \data\index\search\_6io5.cfs
7,153   433 MB  \data\index\search\_5ld6.cfs
6,953   661 MB  \data\index\search\_8jvj.cfs
5,824   74 MB   \data\index\search\_5ld7.frq
5,691   356 MB  \data\index\search\_9eev.fdt
5,638   352 MB  \data\index\search\_8mqi.fdt
5,629   352 MB  \data\index\search\_8jvj.fdt
5,609   351 MB  \data\index\search\_88z8.fdt
5,590   355 MB  \data\index\search\_96l5.fdt
5,568   354 MB  \data\index\search\_8zjn.fdt
5,471   342 MB  \data\index\search\_5wgo.fdt
5,466   342 MB  \data\index\search\_5uo1.fdt
5,450   340 MB  \data\index\search\_5hrn.fdt
5,429   345 MB  \data\index\search\_6nyy.fdt
5,371   353 MB  \data\index\search\_8sob.fdt

Incidentally, we periodically experience some *very* slow searches. Out of
curiousity, I checked for file fragmentation (using 'analyze' mode of the
nfts defragger)

nota bene: Windows sysinternals has a utility "Contig.exe" whic allows you
to defragment individual drives/directories. We'll use that to defragmeent
the  index direcotires

will


can solrj swap cores?

2010-12-03 Thread Will Milspec
hi all,

Does solrj support "swapping cores"?

One of our developers had initially tried swapping solr cores (e.g. core0
and core1) using the solrj api, but it failed. (don't have the exact error)
He susequently replaced the call with straight http (i.e. http client).

Unfortunately I don't have the exact error in front of me...

Solrj code:

   CoreAdminRequest car = new CoreAdminRequest();
   car.setCoreName("production");
   car.setOtherCoreName("reindex");
   car.setAction(CoreAdminParams.CoreAdminAction.SWAP);

  SolrServer solrServer = SolrUtil.getSolrServer();
  car.process(solrServer);
  solrServer.commit();

Finally, can someone comment on the solrj javadoc on CoreAdminRequest:
 * This class is experimental and subject to change.

thanks,

will


nexus of synonyms and stemming, take 2

2010-12-03 Thread Will Milspec
hi all,

[This is a second attempt at emailing. The apache mailing list spam filter
apparently did not like my synonyms entry, ie.. classified my email as spam.
I have replaced phone with 'foo' , 'cell' with 'sell' and 'mobile' with
'nubile' ]

This is a fairly basic synonyms question: how does synonyms handle stemming?


Example: Synonyms.txt has entry:
  sell,sell foo,nubile,nubile foo,wireless foo

If I want to match on 'sell foos'...

a) do I need to add an entry for 'sell foos' (i.e. in addition to sell foo)
b) or will the stemmer (porter/snowball) handle this already


thanks

will


best way to get maxDocs in java (i.e. as on stats.jsp page).

2010-12-01 Thread Will Milspec
hi all,

What's the best way to programmatically-in-java get the 'maxDoc' attribute
(as seen on the stats.jsp page).

I don't see any hooks on the solrj api.

Currently I plan to use an http client to get stats.jsp (which returns xml)
and parse it using xpath.

If anyone can recommend a better approach, please opine.

thanks

will


Solr Git Tags

2010-11-08 Thread Will Milspec
Hi all,

(This question is more oriented to the developer but may find relevant to
the solr user interested in perusing the source)

I've cloned the git lucene-solr repository and was surprised to find no
tags.
   empty here: http://git.apache.org/lucene-solr.git/refs/tags/

Whereas the 'older' git repository ((pre-lucene-solr-merge)   has tags ,
i.e. 1.4.0, 1.4.1, etc.
  http://git.apache.org/solr.git/refs/tags/

Can someone point me to an explanation? Do I need to use svn instead?

I seek to check out the 1.4.1 source so I could patch a class. I want to
patch against the current stable version (1.4.1) rather than the latest
commit.

thanks,

will


Any Copy Field Caveats?

2010-11-05 Thread Will Milspec
Hi all,

we're moving from an old lucene version to solr  and plan to use the "Copy
Field" functionality. Previously we had "rolled our own" implementation,
sticking title, description, etc. in a field called 'content'.

We lose some flexibility (i.e. java layer can no longer control what gets in
the new copied field), at the expense of simplicity. A fair tradeoff IMO.

My question: has anyone found any subtle issues or "gotchas" with copy
fields?

(from the subject line "caveat"--pronounced 'kah-VEY-AT'  is Latin as in
"Caveat Emptor"..."let the buyer beware").

thanks,

will

will


Override SynonymFilterFactory to load synonyms from alternate data source

2010-11-03 Thread Will Milspec
Hi all,

Can anyone comment on the ease/merit of overriding the shipped
SynonymFilterFactory with a version that could load the synonyms from an
alternate data source?

Our application currently maintains synonyms in its database ; we could
export this data to 'synonyms.txt', but would prefer a db aware
implementationv of SynonymFilterFactory, i.e. avoiding that middle step.

>From the looks of the class (private instances, static methods), it doesn't
lend itself to easy subclassing..

Any comments or recommendations?

thanks

will


how to get TermVectorComponent using xml , vs. SOLR-949

2010-11-02 Thread Will Milspec
Hi all,

This seems a basic question: what's the best way to get
TermVectorComponents. from the Solr XmL response?

SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch
adds this ability, but after 2 years it's still not in the mainline. (And
doesn't patch cleanly to the current head 1.4).

I'm new to Solr and familiar with the SolrJ but not as the best means for
getting/parsing the raw xml.  (Typically I find the dtd and right code to
parse the dom using the dtd. In this case I've seen a few examples, but
nothing definiive)

Our team would rather use the "out of the box" solr rather than manually
apply patches and worry about consistency during upgrades...

Thanks in advance,

will