date:20150123


NVM

I figured this out.  The problem was this:  pk="link" in 
rss-dat.config.xml but unique id not link in schema.xml - it is id.


From rss-data-config.xml:

https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
processor="XPathEntityProcessor"
forEach="/nvd/entry">

commonField="true" />
commonField="true" />




From schema.xml:

* id

*What really bothers me is that there were no errors output by Solr to 
indicate this type of misconfiguration error and all the messages that 
Solr gave indicated the import was successful.  This lack of appropriate 
error reporting is a pain, especially for someone learning Solr.


Switching pk="link" to pk="id" solved the problem and I was then able to 
import the data.



On 1/23/15, 9:39 PM, Carl Roberts wrote:

Hi,

I have set log4j logging to level DEBUG and I have also modified the 
code to see what is being imported and I can see the nextRow() 
records, and the import is successful, however I have no data. Can 
someone please help me figure this out?


Here is the logging output:

ow:  r1={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-20

Re: Need Help with custom ZIPURLDataSource class


NVM - I have this working.

The problem was this:  pk="link" in rss-dat.config.xml but unique id not 
link in schema.xml - it is id.


From rss-data-config.xml:


url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";

processor="XPathEntityProcessor"
forEach="/nvd/entry">

commonField="true" />
commonField="true" />




From schema.xml:

* id

*What really bothers me is that there were no errors output by Solr to 
indicate this type of misconfiguration error and all the messages that 
Solr gave indicated the import was successful.  This lack of appropriate 
error reporting is a pain, especially for someone learning Solr.


Switching pk="link" to pk="id" solved the problem and I was then able to 
import the data.

On 1/23/15, 6:34 PM, Carl Roberts wrote:


Hi,

I created a custom ZIPURLDataSource class to unzip the content from an
http URL for an XML ZIP file and it seems to be working (at least I have
no errors), but no data is imported.

Here is my configuration in rss-data-config.xml:




https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
processor="XPathEntityProcessor"
forEach="/nvd/entry"
transformer="DateFormatTransformer">




xpath="/nvd/entry/vulnerable-software-list/product" 
commonField="false" />









Attached is the ZIPURLDataSource.java file.

It actually unzips and saves the raw XML to disk, which I have 
verified to be a valid XML file.  The file has one or more entries 
(here is an example):


http://scap.nist.gov/schema/scap-core/0.1";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xmlns:patch="http://scap.nist.gov/schema/patch/0.1";
xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4";
xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2";
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0";
xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0";
pub_date="2015-01-10T05:37:05"
xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
http://nvd.nist.gov/schema/patch_0.1.xsd
http://scap.nist.gov/schema/scap-core/0.1
http://nvd.nist.gov/schema/scap-core_0.1.xsd
http://scap.nist.gov/schema/feed/vulnerability/2.0
http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0">

http://nvd.nist.gov/";>



























cpe:/o:freebsd:freebsd:2.2.8
cpe:/o:freebsd:freebsd:1.1.5.1
cpe:/o:freebsd:freebsd:2.2.3
cpe:/o:freebsd:freebsd:2.2.2
cpe:/o:freebsd:freebsd:2.2.5
cpe:/o:freebsd:freebsd:2.2.4
cpe:/o:freebsd:freebsd:2.0.5
cpe:/o:freebsd:freebsd:2.2.6
cpe:/o:freebsd:freebsd:2.1.6.1
cpe:/o:freebsd:freebsd:2.0.1
cpe:/o:freebsd:freebsd:2.2
cpe:/o:freebsd:freebsd:2.0
cpe:/o:openbsd:openbsd:2.3
cpe:/o:freebsd:freebsd:3.0
cpe:/o:freebsd:freebsd:1.1
cpe:/o:freebsd:freebsd:2.1.6
cpe:/o:openbsd:openbsd:2.4
cpe:/o:bsdi:bsd_os:3.1
cpe:/o:freebsd:freebsd:1.0
cpe:/o:freebsd:freebsd:2.1.7
cpe:/o:freebsd:freebsd:1.2
cpe:/o:freebsd:freebsd:2.1.5
cpe:/o:freebsd:freebsd:2.1.7.1

CVE-1999-0001
1999-12-30T00:00:00.000-05:00 

2010-12-16T00:00:00.000-05:00 




5.0
NETWORK
LOW
NONE
NONE
NONE
PARTIAL
http://nvd.nist.gov
2004-01-01T00:00:00.000-05:00 






OSVDB
http://www.osvdb.org/5707";
xml:lang="en">5707


CONFIRM
http://www.openbsd.org/errata23.html#tcpfix";
xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix 



ip_input.c in BSD-derived TCP/IP implementations allows
remote attackers to cause a denial of service (crash or hang) via
crafted packets.



Here is the curl command:

curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import

And here is the output from the console for Jetty:

main{StandardDirectoryReader(segments_1:1:nrt)}
2407 [coreLoadExecutor-5-thread-1] INFO
org.apache.solr.core.CoreContainer  registering core: nvd-rss
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
user.dir=/Users/carlroberts/dev/solr-4.10.3/example
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
SolrDispatchFilter.init() done
2431 [main] INFO org.eclipse.jetty.server.AbstractConnector  Started
SocketConnector@0.0.0.0:8983
2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
[nvd-rss] webapp=null path=null
params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false} 


hits=0 status=0 QTime=43
2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
QuerySenderListener done.
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: default
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: wordbreak
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SuggestComponent  Loading suggester
index for: mySuggester
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.SolrSuggester  reload()
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.

Need help importing data


Hi,

I have set log4j logging to level DEBUG and I have also modified the 
code to see what is being imported and I can see the nextRow() records, 
and the import is successful, however I have no data.  Can someone 
please help me figure this out?


Here is the logging output:

ow:  r1={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, 
$forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}}
2015-01-23 21:28:04,606- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r1={{id=CVE-2002-2360, cve=CVE-2002-2360, cwe=CWE-264, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
r3={{id=CVE-2002-2360, cve=CVE-2002-2360, cwe=CWE-264, $forEach=/nvd/entry}}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: 
URL={url}
2015-01-23 21:28:04,607- 
INFO-[Thread-15]-[XPath

Re: SolrCloud result correctness compared with single core

2015-01-23 Thread Erick Erickson

you might, but probably not enough to notice. At 50G, the tf/idf
stats will _probably_ be close enough you won't be able to tell.

That said, recently distributed tf/idf has been implemented but
you need to ask for it, see SOLR-1632. This is Solr 5.0 though.

I've rarely seen it matter except in fairly specialized situations.
Consider a single core. Deleted documents still count towards
some of the tf/idf stats. So your scoring could theoretically
change after, say, an optimize.

So called "bottom line" is that yes, the scoring may change, but
IMO not any more radically than was possible with single cores,
and I wouldn't worry about unless I had evidence that it was
biting me.

Best
Erick

On Fri, Jan 23, 2015 at 2:52 PM, Yandong Yao  wrote:

> Hi Guys,
>
> As the main scoring mechanism is based tf/idf, so will same query running
> against SolrCloud return different result against running it against single
> core with same data sets as idf will only count df inside one core?
>
> eg: Assume I have 100GB data:
> A) Index those data using single core
> B) Index those data using SolrCloud with two cores (each has 50GB data
> index)
>
> Then If I query those with same query like 'apple', then will I get
> different result for A and B?
>
>
> Regards,
> Yandong
>

Re: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Jorge Luis Betancourt González

Tank your Michael for sharing your patch! It was really helpful, but for our 
particular requirement a SearchComponent that rewrites our query is enough (as 
suggested by Alexandre, although thanks a lot), basically we just escape a 
bunch of * that we know are "problematic". 

This approach allow us to quietly avoid the wildcard query and instead use it 
as a normal term query, instead of throwing a SyntaxError which is more 
convenient in our case.

Regards,
 

- Original Message -
From: "Michael F. Ryan (LNG-DAY)" 
To: solr-user@lucene.apache.org
Sent: Friday, January 23, 2015 8:26:48 AM
Subject: RE: Avoiding wildcard queries using edismax query parser

Here's a Jira for this: https://issues.apache.org/jira/browse/SOLR-3031

I've attached a patch there that might be useful for you.

-Michael

-Original Message-
From: Jorge Luis Betancourt González [mailto:jlbetanco...@uci.cu] 
Sent: Thursday, January 22, 2015 4:34 PM
To: solr-user@lucene.apache.org
Subject: Avoiding wildcard queries using edismax query parser

Hello all,

Currently we are using edismax query parser in an internal application, we've 
detected that some wildcard queries including "*" are causing some performance 
issues and for this particular case we're not interested in allowing any user 
to request all the indexed documents. 

This could be easily escaped in the application level, but right now we have 
several applications (using several programming languages) consuming from Solr, 
and adding this into each application is kind of exhausting, so I'm wondering 
if there is some configuration that allow us to treat this special characters 
as normal alphanumeric characters. 

I've tried one solution that worked before, involving the WordDelimiterFilter 
an the types attribute:



and in characters.txt I've mapped the special characters into ALPHA:

+ => ALPHA 
* => ALPHA 

Any thoughts on this?


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.



---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.

Re: Connection Reset Errors with Solr 4.4

2015-01-23 Thread Mike Drob

I'm not sure what a reasonable workaround would be. Perhaps somebody else
can brainstorm and make a suggestion, sorry.

On Tue, Jan 20, 2015 at 12:56 PM, Nishanth S 
wrote:

> Thank you Mike.Sure enough,we are running into the same issue you
> mentoined.Is there a quick fix for this other than the patch.I do not see
> the tlogs getting replayed at all.It is doing a full index recovery from
> the leader and our index size is around 200G.Would lowering the autocommit
> settings help(where the replica would go for a tlog replay as the tlogs I
> see are not huge).
>
> Thanks,
> Nishanth
>
> On Tue, Jan 20, 2015 at 10:46 AM, Mike Drob  wrote:
>
> > Are we sure this isn't SOLR-6931?
> >
> > On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S 
> > wrote:
> >
> > > Hello All,
> > >
> > > We are running solr cloud 4.4 with 30 shards and 3 replicas with real
> > time
> > > indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running
> into
> > an
> > > issue with replicas going into recovery mode  due to connection reset
> > > errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I
> > have
> > > seen that replicas do a full index recovery which takes a long
> > > time(days).Below is the error trace that  I see.I would really
> appreciate
> > > any help in this case.
> > >
> > > g.apache.solr.client.solrj.SolrServerException: IOException occured
> when
> > > talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
> > > at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
> > > at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> > > at
> > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
> > > at
> > >
> > >
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.net.SocketException: Connection reset
> > > at java.net.SocketInputStream.read(SocketInputStream.java:196)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:122)
> > > at
> > >
> > >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
> > > at
> > >
> > >
> >
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
> > > at
> > >
> > >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
> > > at
> > >
> > >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
> > > at
> > >
> > >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
> > > at
> > >
> > >
> >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
> > > at
> > >
> > >
> >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
> > > at
> > >
> > >
> >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
> > > at
> > >
> > >
> >
> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
> > > at
> > >
> > >
> >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
> > > at
> > >
> > >
> >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
> > > at
> > >
> > >
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
> > > at
> > >
> > >
> >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
> > > at
> > >
> > >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> > > at
> > >
> > >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> > > at
> > >
> > >
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> > > at
> > >
> > >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
> > > ... 9 more
> > >
> > >
> > > Thanks,
> > > Nishanth
> > >
> >
>

Re: Solr I/O increases over time

On 1/23/2015 3:52 PM, Daniel Cukier wrote:
> I am running around eight solr servers (version 3.5) instances behind a
> Load Balancer. All servers are identical and the LB is weighted by number
> connections. The servers have around 4M documents and receive a constant
> flow of queries. When the solr server starts, it works fine. But after some
> time running, it starts to take longer respond to queries, and the server
> I/O goes crazy to 100%. Look at the New Relic graphic:
>
> [image: enter image description here]
>
> If the servers behaves well in the beginning, I it starts to fail after
> some time? Then if I restart the server, it gets back to low I/O for same
> time and this repeats over and over.

The mailing list eats almost all attachments.  We can't see your image. 
You can use http://apaste.info for images (up to 1MB) and text, or pick
another hosting provider, and include the URL in your reply.

Most performance problems like this are memory related.  The high I/O
you mentioned definitely sounds like it could be a situation where you
don't have enough RAM available for OS disk cache.  When the OS cannot
cache the index effectively, queries will result in a large amount of
real disk I/O.  If there's enough memory for caching, queries will be
entirely or mostly handled from RAM, which is *MUCH* faster than the disk.

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Need Help with custom ZIPURLDataSource class



Hi,

I created a custom ZIPURLDataSource class to unzip the content from an
http URL for an XML ZIP file and it seems to be working (at least I have
no errors), but no data is imported.

Here is my configuration in rss-data-config.xml:




https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
processor="XPathEntityProcessor"
forEach="/nvd/entry"
transformer="DateFormatTransformer">













Attached is the ZIPURLDataSource.java file.

It actually unzips and saves the raw XML to disk, which I have verified to be a 
valid XML file.  The file has one or more entries (here is an example):

http://scap.nist.gov/schema/scap-core/0.1";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xmlns:patch="http://scap.nist.gov/schema/patch/0.1";
xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4";
xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2";
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0";
xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0";
pub_date="2015-01-10T05:37:05"
xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
http://nvd.nist.gov/schema/patch_0.1.xsd
http://scap.nist.gov/schema/scap-core/0.1
http://nvd.nist.gov/schema/scap-core_0.1.xsd
http://scap.nist.gov/schema/feed/vulnerability/2.0
http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0">

http://nvd.nist.gov/";>



























cpe:/o:freebsd:freebsd:2.2.8
cpe:/o:freebsd:freebsd:1.1.5.1
cpe:/o:freebsd:freebsd:2.2.3
cpe:/o:freebsd:freebsd:2.2.2
cpe:/o:freebsd:freebsd:2.2.5
cpe:/o:freebsd:freebsd:2.2.4
cpe:/o:freebsd:freebsd:2.0.5
cpe:/o:freebsd:freebsd:2.2.6
cpe:/o:freebsd:freebsd:2.1.6.1
cpe:/o:freebsd:freebsd:2.0.1
cpe:/o:freebsd:freebsd:2.2
cpe:/o:freebsd:freebsd:2.0
cpe:/o:openbsd:openbsd:2.3
cpe:/o:freebsd:freebsd:3.0
cpe:/o:freebsd:freebsd:1.1
cpe:/o:freebsd:freebsd:2.1.6
cpe:/o:openbsd:openbsd:2.4
cpe:/o:bsdi:bsd_os:3.1
cpe:/o:freebsd:freebsd:1.0
cpe:/o:freebsd:freebsd:2.1.7
cpe:/o:freebsd:freebsd:1.2
cpe:/o:freebsd:freebsd:2.1.5
cpe:/o:freebsd:freebsd:2.1.7.1

CVE-1999-0001
1999-12-30T00:00:00.000-05:00
2010-12-16T00:00:00.000-05:00


5.0
NETWORK
LOW
NONE
NONE
NONE
PARTIAL
http://nvd.nist.gov
2004-01-01T00:00:00.000-05:00




OSVDB
http://www.osvdb.org/5707";
xml:lang="en">5707


CONFIRM
http://www.openbsd.org/errata23.html#tcpfix";
xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix

ip_input.c in BSD-derived TCP/IP implementations allows
remote attackers to cause a denial of service (crash or hang) via
crafted packets.



Here is the curl command:

curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import

And here is the output from the console for Jetty:

main{StandardDirectoryReader(segments_1:1:nrt)}
2407 [coreLoadExecutor-5-thread-1] INFO
org.apache.solr.core.CoreContainer  registering core: nvd-rss
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
user.dir=/Users/carlroberts/dev/solr-4.10.3/example
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
SolrDispatchFilter.init() done
2431 [main] INFO org.eclipse.jetty.server.AbstractConnector  Started
SocketConnector@0.0.0.0:8983
2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
[nvd-rss] webapp=null path=null
params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false}
hits=0 status=0 QTime=43
2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
QuerySenderListener done.
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: default
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: wordbreak
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SuggestComponent  Loading suggester
index for: mySuggester
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.SolrSuggester  reload()
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.SolrSuggester  build()
2459 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
[nvd-rss] Registered new searcher Searcher@df9e84e[nvd-rss]
main{StandardDirectoryReader(segments_1:1:nrt)}
8371 [qtp1640586218-17] INFO
org.apache.solr.handler.dataimport.DataImporter  Loading DIH
Configuration: rss-data-config.xml
8379 [qtp1640586218-17] INFO
org.apache.solr.handler.dataimport.DataImporter  Data Configuration
loaded successfully
8383 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter 
Starting Full Import
8384 [qtp1640586218-17] INFO org.apache.solr.core.SolrCore  [nvd-rss]
webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=15
8396 [Thread-15] INFO
org.apache.solr.handler.dataimport.SimplePropertiesWriter  Read
dataimport.properties
23431 [commitScheduler-8-thread-1] INFO
org.apache.solr.update.UpdateHandler  start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommi

Solr I/O increases over time

2015-01-23 Thread Daniel Cukier

I am running around eight solr servers (version 3.5) instances behind a
Load Balancer. All servers are identical and the LB is weighted by number
connections. The servers have around 4M documents and receive a constant
flow of queries. When the solr server starts, it works fine. But after some
time running, it starts to take longer respond to queries, and the server
I/O goes crazy to 100%. Look at the New Relic graphic:

[image: enter image description here]

If the servers behaves well in the beginning, I it starts to fail after
some time? Then if I restart the server, it gets back to low I/O for same
time and this repeats over and over.
Daniel Cukier

Fwd: Need Help with custom ZIPURLDataSource class



Hi,

I created a custom ZIPURLDataSource class to unzip the content from an
http URL for an XML ZIP file and it seems to be working (at least I have
no errors), but no data is imported.

Here is my configuration in rss-data-config.xml:




https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
processor="XPathEntityProcessor"
forEach="/nvd/entry"
transformer="DateFormatTransformer">













Attached is the ZIPURLDataSource.java file.

It actually unzips and saves the raw XML to disk, which I have verified to be a 
valid XML file.  The file has one or more entries (here is an example):

http://scap.nist.gov/schema/scap-core/0.1";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xmlns:patch="http://scap.nist.gov/schema/patch/0.1";
xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4";
xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2";
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0";
xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0";
pub_date="2015-01-10T05:37:05"
xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
http://nvd.nist.gov/schema/patch_0.1.xsd
http://scap.nist.gov/schema/scap-core/0.1
http://nvd.nist.gov/schema/scap-core_0.1.xsd
http://scap.nist.gov/schema/feed/vulnerability/2.0
http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0">

http://nvd.nist.gov/";>



























cpe:/o:freebsd:freebsd:2.2.8
cpe:/o:freebsd:freebsd:1.1.5.1
cpe:/o:freebsd:freebsd:2.2.3
cpe:/o:freebsd:freebsd:2.2.2
cpe:/o:freebsd:freebsd:2.2.5
cpe:/o:freebsd:freebsd:2.2.4
cpe:/o:freebsd:freebsd:2.0.5
cpe:/o:freebsd:freebsd:2.2.6
cpe:/o:freebsd:freebsd:2.1.6.1
cpe:/o:freebsd:freebsd:2.0.1
cpe:/o:freebsd:freebsd:2.2
cpe:/o:freebsd:freebsd:2.0
cpe:/o:openbsd:openbsd:2.3
cpe:/o:freebsd:freebsd:3.0
cpe:/o:freebsd:freebsd:1.1
cpe:/o:freebsd:freebsd:2.1.6
cpe:/o:openbsd:openbsd:2.4
cpe:/o:bsdi:bsd_os:3.1
cpe:/o:freebsd:freebsd:1.0
cpe:/o:freebsd:freebsd:2.1.7
cpe:/o:freebsd:freebsd:1.2
cpe:/o:freebsd:freebsd:2.1.5
cpe:/o:freebsd:freebsd:2.1.7.1

CVE-1999-0001
1999-12-30T00:00:00.000-05:00
2010-12-16T00:00:00.000-05:00


5.0
NETWORK
LOW
NONE
NONE
NONE
PARTIAL
http://nvd.nist.gov
2004-01-01T00:00:00.000-05:00




OSVDB
http://www.osvdb.org/5707";
xml:lang="en">5707


CONFIRM
http://www.openbsd.org/errata23.html#tcpfix";
xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix

ip_input.c in BSD-derived TCP/IP implementations allows
remote attackers to cause a denial of service (crash or hang) via
crafted packets.



Here is the curl command:

curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import

And here is the output from the console for Jetty:

main{StandardDirectoryReader(segments_1:1:nrt)}
2407 [coreLoadExecutor-5-thread-1] INFO
org.apache.solr.core.CoreContainer  registering core: nvd-rss
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
user.dir=/Users/carlroberts/dev/solr-4.10.3/example
2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter 
SolrDispatchFilter.init() done
2431 [main] INFO org.eclipse.jetty.server.AbstractConnector  Started
SocketConnector@0.0.0.0:8983
2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
[nvd-rss] webapp=null path=null
params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false}
hits=0 status=0 QTime=43
2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
QuerySenderListener done.
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: default
2451 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SpellCheckComponent  Loading spell
index for spellchecker: wordbreak
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.handler.component.SuggestComponent  Loading suggester
index for: mySuggester
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.SolrSuggester  reload()
2452 [searcherExecutor-6-thread-1] INFO
org.apache.solr.spelling.suggest.SolrSuggester  build()
2459 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore 
[nvd-rss] Registered new searcher Searcher@df9e84e[nvd-rss]
main{StandardDirectoryReader(segments_1:1:nrt)}
8371 [qtp1640586218-17] INFO
org.apache.solr.handler.dataimport.DataImporter  Loading DIH
Configuration: rss-data-config.xml
8379 [qtp1640586218-17] INFO
org.apache.solr.handler.dataimport.DataImporter  Data Configuration
loaded successfully
8383 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter 
Starting Full Import
8384 [qtp1640586218-17] INFO org.apache.solr.core.SolrCore  [nvd-rss]
webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=15
8396 [Thread-15] INFO
org.apache.solr.handler.dataimport.SimplePropertiesWriter  Read
dataimport.properties
23431 [commitScheduler-8-thread-1] INFO
org.apache.solr.update.UpdateHandler  start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommi

SolrCloud result correctness compared with single core

2015-01-23 Thread Yandong Yao

Hi Guys,

As the main scoring mechanism is based tf/idf, so will same query running
against SolrCloud return different result against running it against single
core with same data sets as idf will only count df inside one core?

eg: Assume I have 100GB data:
A) Index those data using single core
B) Index those data using SolrCloud with two cores (each has 50GB data
index)

Then If I query those with same query like 'apple', then will I get
different result for A and B?


Regards,
Yandong

Re: How to inject custom response data after results have been sorted

2015-01-23 Thread tedsolr

Thank you so much for your responses Hoss and Shalin. I gather the
DocTransfomer allows manipulations to the doc list returned in the results.
That is very cool. So the transformer has access to the Solr Request. I
haven't seen the hook yet, but I believe you - I'll have to keep looking. It
would certainly be cleaner to return my stats as "fields" within each doc.
My plan was to attach the stats as a map to the response, and post process
in my app.

I was able to quickly mock up a custom SearchComponent and verify that it
receives the doc list in sorted order, and that I could retrieve objects
form the request context. So this search component would allow me to simply
"paste" the filtered map of stats to the response.

Is there a performance benefit one way or the other? Is it just easier in
the DocTransformer since there is a method transform(doc, id) that must get
called for every return doc?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-inject-custom-response-data-after-results-have-been-sorted-tp4181545p4181602.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple data source indexing through data import handler

2015-01-23 Thread Qiu Mo

Alex,

Thanks,  I tried ${item.id},  it doesn’t work. 
However, if hardcode a id number instead of  '${item.id}’ , the it add this one 
line to every document.

for example 
select description from feature where item_id=3456

then this single description is added to every document as a field. it seems 
the parent entity item.id is
not passed to the second entity.

Thanks,

Qiu Mo (Joe)

From: Alexandre Rafalovitch 
Sent: Friday, January 23, 2015 3:25 PM
To: solr-user
Subject: Re: multiple data source indexing through data import handler

Try ${item.id} as that's what you are mapping it to.

See also: https://issues.apache.org/jira/browse/SOLR-4383

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 23 January 2015 at 15:01, Qiu Mo  wrote:
> I am indexing data from two different databases, but I can't add second 
> database to indexing, can anyone help!  below is my dats-config.xml
>
>
>  url="jdbc:mysql://XXX" user="XXX" password="XXX"/>
>
>  url="jdbc:mysql://XXX" user="XXX" password="XXX"/>
>
>
> 
>
> 
>
> 
>
>
> 
>
> 
>
> 
>
>
>
> 
>
> my log indicate that '${item.ID}' is not catch any value from entity item.
>
> Thanks,
>
> Joe Moore

Re: multiple data source indexing through data import handler

Try ${item.id} as that's what you are mapping it to.

See also: https://issues.apache.org/jira/browse/SOLR-4383

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 15:01, Qiu Mo  wrote:
> I am indexing data from two different databases, but I can't add second 
> database to indexing, can anyone help!  below is my dats-config.xml
>
>
>  url="jdbc:mysql://XXX" user="XXX" password="XXX"/>
>
>  url="jdbc:mysql://XXX" user="XXX" password="XXX"/>
>
>
> 
>
> 
>
> 
>
>
> 
>
> 
>
> 
>
>
>
> 
>
> my log indicate that '${item.ID}' is not catch any value from entity item.
>
> Thanks,
>
> Joe Moore

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

Unzipping things might be an issue. You may need to do that as part of
a batch job outside of Solr. For the rest, go through the
documentation first, it does answer a bunch of questions. There is
also a page on the Wiki as well, not just in the reference guide.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 14:51, Carl Roberts  wrote:
> Excellent - thanks Shalin.  But how does delta-import work?  Does it do a
> clean also?  Does it require a unique Id?  Does it update existing records
> and only add when necessary?
>
> And, how would I go about unzipping the content from a URL to then import
> the unzipped XML?  Is the recommended way to extend the URLDataSource class
> or is there any built-in logic to plug in pre-processing handlers?
>
>
> And,
>
> On 1/23/15, 2:39 PM, Shalin Shekhar Mangar wrote:
>>
>> If you add clean=false as a parameter to the full-import then deletion is
>> disabled. Since you are ingesting RSS there is no need for deletion at all
>> I guess.
>>
>> On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts
>> >>
>>> wrote:
>>> OK - Thanks for the doc.
>>>
>>> Is it possible to just provide an empty value to preImportDeleteQuery to
>>> disable the delete prior to import?
>>>
>>> Will the data still be deleted for each entity during a delta-import
>>> instead of full-import?
>>>
>>> Is there any capability in the handler to unzip an XML file from a URL
>>> prior to reading it or can I perhaps hook a custom pre-processing
>>> handler?
>>>
>>> Regards,
>>>
>>> Joe
>>>
>>>
>>>
>>> On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote:
>>>
 https://cwiki.apache.org/confluence/display/solr/
 Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

 Admin UI has the interface, so you can play there once you define it.

 You do have to use Curl, there is no built-in scheduler.

 Regards,
  Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 23 January 2015 at 13:29, Carl Roberts
 
 wrote:

> Hi Alex,
>
> If I am understanding this correctly, I can define multiple entities
> like
> this?
>
> 
>   
>   
>   
>   ...
> 
>
> How would I trigger loading certain entities during start?
>
> How would I trigger loading other entities during update?
>
> Is there a way to set an auto-update for certain entities so that I
> don't
> have to invoke an update via curl?
>
> Where / how do I specify the preImportDeleteQuery to avoid deleting
> everything upon each update?
>
> Is there an example or doc that shows how to do all this?
>
> Regards,
>
> Joe
>
>
> On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:
>
>> You can define both multiple entities in the same file and nested
>> entities if your list comes from an external source (e.g. a text file
>> of URLs).
>> You can also trigger DIH with a name of a specific entity to load just
>> that.
>> You can even pass DIH configuration file when you are triggering the
>> processing start, so you can have different files completely for
>> initial load and update. Though you can just do the same with
>> entities.
>>
>> The only thing to be aware of is that before an entity definition is
>> processed, a delete command is run. By default, it's "delete all", so
>> executing one entity will delete everything but then just populate
>> that one entity's results. You can avoid that by defining
>> preImportDeleteQuery and having a clear identifier on content
>> generated by each entity (e.g. source, either extracted or manually
>> added with TemplateTransformer).
>>
>> Regards,
>>   Alex.
>>
>> 
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 23 January 2015 at 11:15, Carl Roberts <
>> carl.roberts.zap...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have the RSS DIH example working with my own RSS feed - here is the
>>> configuration for it.
>>>
>>> 
>>>
>>>
>>>>>pk="link"
>>>url="https://nvd.nist.gov/download/nvd-rss.xml";
>>>processor="XPathEntityProcessor"
>>>forEach="/RDF/item"
>>>transformer="DateFormatTransformer">
>>>
>>>>> commonField="true" />
>>>>> commonField="true"
>>> />
>>>>> commonField="true" />
>>>>> commonField="true"
>>> />
>>>
>>>
>>>
>>> 
>>>
>>> However, my problem is that I also have to load multiple XML feeds
>>> into
>>> the
>>> same

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky

That's phone the filter is doing - transforming text into phonetic codes at
index time. And at query time as well to do the phonetic matching in the
query. The actual phonetic codes are stored in the index for the purposes
of query matching.

-- Jack Krupansky

On Fri, Jan 23, 2015 at 12:57 PM, Amit Jha  wrote:

> Can I extend solr to add phonetic codes at time of indexing as uuid field
> getting added. Because I want to preprocess the metaphone code because I
> calculate the code on runtime will give me some performance hit.
>
> Rgds
> AJ
>
> > On Jan 23, 2015, at 5:37 PM, Jack Krupansky 
> wrote:
> >
> > Your app can use the field analysis API (FieldAnalysisRequestHandler) to
> > query Solr for what the resulting field values are for each filter in the
> > analysis chain for a given input string. This is what the Solr Admin UI
> > Analysis web page uses.
> >
> > See:
> >
> http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html
> > and in solrconfig.xml
> >
> >
> > -- Jack Krupansky
> >
> >> On Thu, Jan 22, 2015 at 8:42 AM, Amit Jha  wrote:
> >>
> >> Hi,
> >>
> >> I need to know how can I retrieve phonetic codes. Does solr provide it
> as
> >> part of result? I need codes for record matching.
> >>
> >> *following is schema fragment:*
> >>
> >>  >> class="solr.TextField" >
> >>  
> >>
> >> >> maxCodeLength="4"/>
> >>  
> >>
> >>
> >>  stored="true"/>
> >>  
> >>  
> >>   stored="true"/>
> >>
> >> 
> >> 
> >>
>

multiple data source indexing through data import handler

2015-01-23 Thread Qiu Mo

I am indexing data from two different databases, but I can't add second 
database to indexing, can anyone help!  below is my dats-config.xml
























my log indicate that '${item.ID}' is not catch any value from entity item.

Thanks,

Joe Moore

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

Excellent - thanks Shalin.  But how does delta-import work?  Does it do 
a clean also?  Does it require a unique Id?  Does it update existing 
records and only add when necessary?


And, how would I go about unzipping the content from a URL to then 
import the unzipped XML?  Is the recommended way to extend the 
URLDataSource class or is there any built-in logic to plug in 
pre-processing handlers?



And,
On 1/23/15, 2:39 PM, Shalin Shekhar Mangar wrote:

If you add clean=false as a parameter to the full-import then deletion is
disabled. Since you are ingesting RSS there is no need for deletion at all
I guess.

On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts 
wrote:
OK - Thanks for the doc.

Is it possible to just provide an empty value to preImportDeleteQuery to
disable the delete prior to import?

Will the data still be deleted for each entity during a delta-import
instead of full-import?

Is there any capability in the handler to unzip an XML file from a URL
prior to reading it or can I perhaps hook a custom pre-processing handler?

Regards,

Joe



On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote:


https://cwiki.apache.org/confluence/display/solr/
Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Admin UI has the interface, so you can play there once you define it.

You do have to use Curl, there is no built-in scheduler.

Regards,
 Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 13:29, Carl Roberts 
wrote:


Hi Alex,

If I am understanding this correctly, I can define multiple entities like
this?


  
  
  
  ...


How would I trigger loading certain entities during start?

How would I trigger loading other entities during update?

Is there a way to set an auto-update for certain entities so that I don't
have to invoke an update via curl?

Where / how do I specify the preImportDeleteQuery to avoid deleting
everything upon each update?

Is there an example or doc that shows how to do all this?

Regards,

Joe


On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:


You can define both multiple entities in the same file and nested
entities if your list comes from an external source (e.g. a text file
of URLs).
You can also trigger DIH with a name of a specific entity to load just
that.
You can even pass DIH configuration file when you are triggering the
processing start, so you can have different files completely for
initial load and update. Though you can just do the same with
entities.

The only thing to be aware of is that before an entity definition is
processed, a delete command is run. By default, it's "delete all", so
executing one entity will delete everything but then just populate
that one entity's results. You can avoid that by defining
preImportDeleteQuery and having a clear identifier on content
generated by each entity (e.g. source, either extracted or manually
added with TemplateTransformer).

Regards,
  Alex.


Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 11:15, Carl Roberts <
carl.roberts.zap...@gmail.com>
wrote:


Hi,

I have the RSS DIH example working with my own RSS feed - here is the
configuration for it.


   
   
   https://nvd.nist.gov/download/nvd-rss.xml";
   processor="XPathEntityProcessor"
   forEach="/RDF/item"
   transformer="DateFormatTransformer">

   
   
   
   

   
   


However, my problem is that I also have to load multiple XML feeds into
the
same core.  Here is one example (there are about 10 of them):

http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip


Is there any built-in functionality that would allow me to do this?
Basically, the use-case is to load and index all the XML ZIP files
first,
and then check the RSS feed every two hours and update the indexes with
any
new ones.

Regards,

Joe

Re: How to inject custom response data after results have been sorted

2015-01-23 Thread Chris Hostetter


: If you just need to transform an individual result, that can be done by a
: custom DocTransformer. But from your email, I think you need a custom
: SearchComponent.

if your PostFilter has already collected all of the info you need, and you 
now just wnat to return a subset of that information that corrispponds to 
the individual documents being returmed on the current "page" of results 
(ie: the current DocList) then a custom DocTransformer should probably be 
enough as long as your PostFilter puts the computed data in the request 
context.

see for example how the ElevatedMarkerFactory works in conjunction with 
the QueryElevationComponent...

https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

-Hoss
http://www.lucidworks.com/

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

If you add clean=false as a parameter to the full-import then deletion is
disabled. Since you are ingesting RSS there is no need for deletion at all
I guess.

On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts  wrote:

> OK - Thanks for the doc.
>
> Is it possible to just provide an empty value to preImportDeleteQuery to
> disable the delete prior to import?
>
> Will the data still be deleted for each entity during a delta-import
> instead of full-import?
>
> Is there any capability in the handler to unzip an XML file from a URL
> prior to reading it or can I perhaps hook a custom pre-processing handler?
>
> Regards,
>
> Joe
>
>
>
> On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote:
>
>> https://cwiki.apache.org/confluence/display/solr/
>> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler
>>
>> Admin UI has the interface, so you can play there once you define it.
>>
>> You do have to use Curl, there is no built-in scheduler.
>>
>> Regards,
>> Alex.
>> 
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 23 January 2015 at 13:29, Carl Roberts 
>> wrote:
>>
>>> Hi Alex,
>>>
>>> If I am understanding this correctly, I can define multiple entities like
>>> this?
>>>
>>> 
>>>  
>>>  
>>>  
>>>  ...
>>> 
>>>
>>> How would I trigger loading certain entities during start?
>>>
>>> How would I trigger loading other entities during update?
>>>
>>> Is there a way to set an auto-update for certain entities so that I don't
>>> have to invoke an update via curl?
>>>
>>> Where / how do I specify the preImportDeleteQuery to avoid deleting
>>> everything upon each update?
>>>
>>> Is there an example or doc that shows how to do all this?
>>>
>>> Regards,
>>>
>>> Joe
>>>
>>>
>>> On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:
>>>
 You can define both multiple entities in the same file and nested
 entities if your list comes from an external source (e.g. a text file
 of URLs).
 You can also trigger DIH with a name of a specific entity to load just
 that.
 You can even pass DIH configuration file when you are triggering the
 processing start, so you can have different files completely for
 initial load and update. Though you can just do the same with
 entities.

 The only thing to be aware of is that before an entity definition is
 processed, a delete command is run. By default, it's "delete all", so
 executing one entity will delete everything but then just populate
 that one entity's results. You can avoid that by defining
 preImportDeleteQuery and having a clear identifier on content
 generated by each entity (e.g. source, either extracted or manually
 added with TemplateTransformer).

 Regards,
  Alex.

 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 23 January 2015 at 11:15, Carl Roberts <
 carl.roberts.zap...@gmail.com>
 wrote:

> Hi,
>
> I have the RSS DIH example working with my own RSS feed - here is the
> configuration for it.
>
> 
>   
>   
>      pk="link"
>   url="https://nvd.nist.gov/download/nvd-rss.xml";
>   processor="XPathEntityProcessor"
>   forEach="/RDF/item"
>   transformer="DateFormatTransformer">
>
>    commonField="true" />
>    commonField="true"
> />
>    commonField="true" />
>    commonField="true"
> />
>
>   
>   
> 
>
> However, my problem is that I also have to load multiple XML feeds into
> the
> same core.  Here is one example (there are about 10 of them):
>
> http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip
>
>
> Is there any built-in functionality that would allow me to do this?
> Basically, the use-case is to load and index all the XML ZIP files
> first,
> and then check the RSS feed every two hours and update the indexes with
> any
> new ones.
>
> Regards,
>
> Joe
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

OK - Thanks for the doc.

Is it possible to just provide an empty value to preImportDeleteQuery to
disable the delete prior to import?

Will the data still be deleted for each entity during a delta-import
instead of full-import?

Is there any capability in the handler to unzip an XML file from a URL
prior to reading it or can I perhaps hook a custom pre-processing handler?

Regards,

Joe

On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote:

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Admin UI has the interface, so you can play there once you define it.

You do have to use Curl, there is no built-in scheduler.

Regards,
Alex.

On 23 January 2015 at 13:29, Carl Roberts wrote:

Hi Alex,

If I am understanding this correctly, I can define multiple entities like
this?

...

How would I trigger loading certain entities during start?

How would I trigger loading other entities during update?

Is there a way to set an auto-update for certain entities so that I don't
have to invoke an update via curl?

Where / how do I specify the preImportDeleteQuery to avoid deleting
everything upon each update?

Is there an example or doc that shows how to do all this?

Regards,

Joe

On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:

You can define both multiple entities in the same file and nested
entities if your list comes from an external source (e.g. a text file
of URLs).
You can also trigger DIH with a name of a specific entity to load just
that.
You can even pass DIH configuration file when you are triggering the
processing start, so you can have different files completely for
initial load and update. Though you can just do the same with
entities.

The only thing to be aware of is that before an entity definition is
processed, a delete command is run. By default, it's "delete all", so
executing one entity will delete everything but then just populate
that one entity's results. You can avoid that by defining
preImportDeleteQuery and having a clear identifier on content
generated by each entity (e.g. source, either extracted or manually
added with TemplateTransformer).

Regards,
Alex.

On 23 January 2015 at 11:15, Carl Roberts
wrote:

Hi,

I have the RSS DIH example working with my own RSS feed - here is the
configuration for it.

https://nvd.nist.gov/download/nvd-rss.xml";
processor="XPathEntityProcessor"
forEach="/RDF/item"
transformer="DateFormatTransformer">

However, my problem is that I also have to load multiple XML feeds into
the
same core. Here is one example (there are about 10 of them):

http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip

Is there any built-in functionality that would allow me to do this?
Basically, the use-case is to load and index all the XML ZIP files first,
and then check the RSS feed every two hours and update the indexes with
any
new ones.

Regards,

Joe

Re: How to inject custom response data after results have been sorted

If you just need to transform an individual result, that can be done by a
custom DocTransformer. But from your email, I think you need a custom
SearchComponent.

On Fri, Jan 23, 2015 at 6:23 PM, tedsolr  wrote:

> Hello! With the help of this community I have solved 2 problems on my way
> to
> creating a search that collapses documents based on multiple fields. The
> CollapsingQParserPlugin was key.
>
> I have a new problem now. All the custom stats I generate in my custom
> QParser makes for way to much data to simply write out in the response. I
> need to filter that data so I only return the stats the user will see on
> one
> page. Say my search returns 800K collapsed docs - in the
> DelegatingCollector's collect() method I am computing some info for each
> collapsed group - that's 800K map entries.
>
> I can't filter the stats in my post filter implementation because the
> results have not been sorted. So I need a new downstream component that can
> read the sorted results, and grab the custom stats from my post filter. Can
> someone recommend a suggestion? Is this a SearchComponent extension? Where
> is the proper hook for examining results after sorting?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-inject-custom-response-data-after-results-have-been-sorted-tp4181545.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Sporadic Socket Timeout Error during Import

The default is 10 seconds and you can increase it by adding a "readTimeout"
attribute (whose value is in milliseconds) in the URLDataSource e.g.



On Fri, Jan 23, 2015 at 6:33 PM, Carl Roberts  wrote:

> Hi,
>
> I am using the DIH RSS example and I am running into a sporadic socket
> timeout error during every 3rd or 4th request. Below is the stack trace.
> What is the default socket timeout for reads and how can I increase it?
>
>
> 15046 [Thread-17] ERROR org.apache.solr.handler.dataimport.URLDataSource
> – Exception thrown while getting data
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
> at sun.security.ssl.InputRecord.read(InputRecord.java:480)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
> at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
> HttpURLConnection.java:1323)
> at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(
> HttpsURLConnectionImpl.java:254)
> at org.apache.solr.handler.dataimport.URLDataSource.
> getData(URLDataSource.java:98)
> at org.apache.solr.handler.dataimport.URLDataSource.
> getData(URLDataSource.java:42)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(
> XPathEntityProcessor.java:283)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
> XPathEntityProcessor.java:224)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(
> XPathEntityProcessor.java:204)
> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(
> EntityProcessorWrapper.java:243)
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:476)
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:415)
> at org.apache.solr.handler.dataimport.DocBuilder.
> doFullDump(DocBuilder.java:330)
> at org.apache.solr.handler.dataimport.DocBuilder.execute(
> DocBuilder.java:232)
> at org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.java:416)
> at org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:480)
> at org.apache.solr.handler.dataimport.DataImporter$1.run(
> DataImporter.java:461)
> 815049 [Thread-17] ERROR org.apache.solr.handler.dataimport.DocBuilder –
> Exception while processing: nvd-rss document : SolrInputDocument(fields:
> []):org.apache.solr.handler.dataimport.DataImportHandlerException:
> Exception in invoking url https://nvd.nist.gov/download/nvd-rss.xml
> Processing Document # 1
> at org.apache.solr.handler.dataimport.URLDataSource.
> getData(URLDataSource.java:115)
> at org.apache.solr.handler.dataimport.URLDataSource.
> getData(URLDataSource.java:42)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(
> XPathEntityProcessor.java:283)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
> XPathEntityProcessor.java:224)
> at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(
> XPathEntityProcessor.java:204)
> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(
> EntityProcessorWrapper.java:243)
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:476)
> at org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:415)
> at org.apache.solr.handler.dataimport.DocBuilder.
> doFullDump(DocBuilder.java:330)
> at org.apache.solr.handler.dataimport.DocBuilder.execute(
> DocBuilder.java:232)
> at org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.java:416)
> at org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:480)
> at org.apache.solr.handler.dataimport.DataImporter$1.run(
> DataImporter.java:461)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
> at sun.security.ssl.InputRecord.read(InputRecord.java:480)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
> at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
> a

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

Admin UI has the interface, so you can play there once you define it.

You do have to use Curl, there is no built-in scheduler.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 13:29, Carl Roberts  wrote:
> Hi Alex,
>
> If I am understanding this correctly, I can define multiple entities like
> this?
>
> 
> 
> 
> 
> ...
> 
>
> How would I trigger loading certain entities during start?
>
> How would I trigger loading other entities during update?
>
> Is there a way to set an auto-update for certain entities so that I don't
> have to invoke an update via curl?
>
> Where / how do I specify the preImportDeleteQuery to avoid deleting
> everything upon each update?
>
> Is there an example or doc that shows how to do all this?
>
> Regards,
>
> Joe
>
>
> On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:
>>
>> You can define both multiple entities in the same file and nested
>> entities if your list comes from an external source (e.g. a text file
>> of URLs).
>> You can also trigger DIH with a name of a specific entity to load just
>> that.
>> You can even pass DIH configuration file when you are triggering the
>> processing start, so you can have different files completely for
>> initial load and update. Though you can just do the same with
>> entities.
>>
>> The only thing to be aware of is that before an entity definition is
>> processed, a delete command is run. By default, it's "delete all", so
>> executing one entity will delete everything but then just populate
>> that one entity's results. You can avoid that by defining
>> preImportDeleteQuery and having a clear identifier on content
>> generated by each entity (e.g. source, either extracted or manually
>> added with TemplateTransformer).
>>
>> Regards,
>> Alex.
>>
>> 
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 23 January 2015 at 11:15, Carl Roberts 
>> wrote:
>>>
>>> Hi,
>>>
>>> I have the RSS DIH example working with my own RSS feed - here is the
>>> configuration for it.
>>>
>>> 
>>>  
>>>  
>>>  >>  pk="link"
>>>  url="https://nvd.nist.gov/download/nvd-rss.xml";
>>>  processor="XPathEntityProcessor"
>>>  forEach="/RDF/item"
>>>  transformer="DateFormatTransformer">
>>>
>>>  >> commonField="true" />
>>>  >> commonField="true"
>>> />
>>>  >> commonField="true" />
>>>  >> commonField="true"
>>> />
>>>
>>>  
>>>  
>>> 
>>>
>>> However, my problem is that I also have to load multiple XML feeds into
>>> the
>>> same core.  Here is one example (there are about 10 of them):
>>>
>>> http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip
>>>
>>>
>>> Is there any built-in functionality that would allow me to do this?
>>> Basically, the use-case is to load and index all the XML ZIP files first,
>>> and then check the RSS feed every two hours and update the indexes with
>>> any
>>> new ones.
>>>
>>> Regards,
>>>
>>> Joe
>>>
>>>
>

Sporadic Socket Timeout Error during Import


Hi,

I am using the DIH RSS example and I am running into a sporadic socket 
timeout error during every 3rd or 4th request. Below is the stack trace. 
What is the default socket timeout for reads and how can I increase it?



15046 [Thread-17] ERROR org.apache.solr.handler.dataimport.URLDataSource 
– Exception thrown while getting data

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at 
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:98)
at 
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:42)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
815049 [Thread-17] ERROR org.apache.solr.handler.dataimport.DocBuilder – 
Exception while processing: nvd-rss document : SolrInputDocument(fields: 
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: 
Exception in invoking url https://nvd.nist.gov/download/nvd-rss.xml 
Processing Document # 1
at 
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:115)
at 
org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:42)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at 
sun.net.www.protocol.http.HttpURLConnection.g

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?


Hi Alex,

If I am understanding this correctly, I can define multiple entities 
like this?






...


How would I trigger loading certain entities during start?

How would I trigger loading other entities during update?

Is there a way to set an auto-update for certain entities so that I 
don't have to invoke an update via curl?


Where / how do I specify the preImportDeleteQuery to avoid deleting 
everything upon each update?


Is there an example or doc that shows how to do all this?

Regards,

Joe

On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote:

You can define both multiple entities in the same file and nested
entities if your list comes from an external source (e.g. a text file
of URLs).
You can also trigger DIH with a name of a specific entity to load just that.
You can even pass DIH configuration file when you are triggering the
processing start, so you can have different files completely for
initial load and update. Though you can just do the same with
entities.

The only thing to be aware of is that before an entity definition is
processed, a delete command is run. By default, it's "delete all", so
executing one entity will delete everything but then just populate
that one entity's results. You can avoid that by defining
preImportDeleteQuery and having a clear identifier on content
generated by each entity (e.g. source, either extracted or manually
added with TemplateTransformer).

Regards,
Alex.


Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 January 2015 at 11:15, Carl Roberts  wrote:

Hi,

I have the RSS DIH example working with my own RSS feed - here is the
configuration for it.


 
 
 https://nvd.nist.gov/download/nvd-rss.xml";
 processor="XPathEntityProcessor"
 forEach="/RDF/item"
 transformer="DateFormatTransformer">

 
 
 
 

 
 


However, my problem is that I also have to load multiple XML feeds into the
same core.  Here is one example (there are about 10 of them):

http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip


Is there any built-in functionality that would allow me to do this?
Basically, the use-case is to load and index all the XML ZIP files first,
and then check the RSS feed every two hours and update the indexes with any
new ones.

Regards,

Joe

How to inject custom response data after results have been sorted

2015-01-23 Thread tedsolr

Hello! With the help of this community I have solved 2 problems on my way to
creating a search that collapses documents based on multiple fields. The
CollapsingQParserPlugin was key.

I have a new problem now. All the custom stats I generate in my custom
QParser makes for way to much data to simply write out in the response. I
need to filter that data so I only return the stats the user will see on one
page. Say my search returns 800K collapsed docs - in the
DelegatingCollector's collect() method I am computing some info for each
collapsed group - that's 800K map entries.

I can't filter the stats in my post filter implementation because the
results have not been sorted. So I need a new downstream component that can
read the sorted results, and grab the custom stats from my post filter. Can
someone recommend a suggestion? Is this a SearchComponent extension? Where
is the proper hook for examining results after sorting?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-inject-custom-response-data-after-results-have-been-sorted-tp4181545.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Amit Jha

Can I extend solr to add phonetic codes at time of indexing as uuid field 
getting added. Because I want to preprocess the metaphone code because I 
calculate the code on runtime will give me some performance hit.

Rgds
AJ

> On Jan 23, 2015, at 5:37 PM, Jack Krupansky  wrote:
> 
> Your app can use the field analysis API (FieldAnalysisRequestHandler) to
> query Solr for what the resulting field values are for each filter in the
> analysis chain for a given input string. This is what the Solr Admin UI
> Analysis web page uses.
> 
> See:
> http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html
> and in solrconfig.xml
> 
> 
> -- Jack Krupansky
> 
>> On Thu, Jan 22, 2015 at 8:42 AM, Amit Jha  wrote:
>> 
>> Hi,
>> 
>> I need to know how can I retrieve phonetic codes. Does solr provide it as
>> part of result? I need codes for record matching.
>> 
>> *following is schema fragment:*
>> 
>> > class="solr.TextField" >
>>  
>>
>>> maxCodeLength="4"/>
>>  
>>
>> 
>> 
>>  
>>  
>>  
>> 
>> 
>> 
>>

Re: Suggester Example In Documentation Not Working

2015-01-23 Thread Chris Hostetter

: However, you will notice on page 228, under the section "Suggester", it
: gives an example of a suggester search component using
: solr.SpellCheckComponet.
...
: So it would appear the solr.SuggestComponent has been around since 4.7,
: but the documentation has not caught up with the changes. Which is the
: source of a little confusion.

Ah -- ok .. yeah, sorry ...

You are correct, there was definitely a lag in having he ref guide updated
to account for the new SuggestComponent -- I didn't realize that.

: Nevertheless, I had hoped to find a simple working example that I could
: use as a starting point to get the solr.SuggestComponent working so that
: I might play around with it and make it do what I want. The suggester
: appears to have many parameters and options, of which, several contain
: little or no explanation.

the params should all be documented in the ref guide *now* -- and you are
correct, that consulting the current ref guide to understand what those
params do will likely be helpful to you -- i guess the main take away of
my comment "#1" was to keep in mind that you may find some params
documented for 5.0 which didn't exist in 4.8. (I'm not sure)

as far as starting with a simple example -- there is absolutely an example
of using the SuggestComponent in the 4.8 sample solrconfig.xml, and if you
index the exampledocs you can see it produce suggestions with a URL like
this...

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=mySuggester&suggest.q=elec

...but my point "#2" is still very important to keep in mind -- that URL
gives good suggestions for "elec" precisely because of what terms exist in
the example docs that were index -- the URL you posted is only going to
give interesting suggestions if there are terms in your index (in the
configured fields) that are relevant. if i try this URL...

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=mySuggester&suggest.q=kern

...i get no suggestions, because none of hte indexed docs have any words
starting with "kern"

in general: posting the examples of URLs you have tried and gotten no good
suggest results from isn't enough for anyone to help give you guidence
unless you also post the specifics of the documents you indexed.

: 2) the behavior of the suggester is very specific to the contents of the
: dictionary built -- the examples on that page apply to the example docs
: included with solr -- hence the techproduct data, and the example queries
: for input like "elec" suggesting "electronics"
:
: no where on that page is an example using the query "kern" -- wether or
: not that input would return a suggestion is going to be entirely dependent
: on wether the dictionary you built contains any similar terms to suggest.
:
: if you can please post more details about your documents -- ideally a full
: set of all the documents in your index (using a small test index of
: course) that may help to understand the results you are getting.

-Hoss
http://www.lucidworks.com/

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

You can define both multiple entities in the same file and nested
entities if your list comes from an external source (e.g. a text file
of URLs).
You can also trigger DIH with a name of a specific entity to load just that.
You can even pass DIH configuration file when you are triggering the
processing start, so you can have different files completely for
initial load and update. Though you can just do the same with
entities.

The only thing to be aware of is that before an entity definition is
processed, a delete command is run. By default, it's "delete all", so
executing one entity will delete everything but then just populate
that one entity's results. You can avoid that by defining
preImportDeleteQuery and having a clear identifier on content
generated by each entity (e.g. source, either extracted or manually
added with TemplateTransformer).

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 23 January 2015 at 11:15, Carl Roberts  wrote:
> Hi,
>
> I have the RSS DIH example working with my own RSS feed - here is the
> configuration for it.
>
> 
> 
> 
>  pk="link"
> url="https://nvd.nist.gov/download/nvd-rss.xml";
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>
> 
>  />
>  commonField="true" />
>  />
>
> 
> 
> 
>
> However, my problem is that I also have to load multiple XML feeds into the
> same core.  Here is one example (there are about 10 of them):
>
> http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip
>
>
> Is there any built-in functionality that would allow me to do this?
> Basically, the use-case is to load and index all the XML ZIP files first,
> and then check the RSS feed every two hours and update the indexes with any
> new ones.
>
> Regards,
>
> Joe
>
>

Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?


Hi,

I have the RSS DIH example working with my own RSS feed - here is the 
configuration for it.





https://nvd.nist.gov/download/nvd-rss.xml";
processor="XPathEntityProcessor"
forEach="/RDF/item"
transformer="DateFormatTransformer">

commonField="true" />
commonField="true" />
commonField="true" />
commonField="true" />






However, my problem is that I also have to load multiple XML feeds into 
the same core.  Here is one example (there are about 10 of them):


http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip


Is there any built-in functionality that would allow me to do this? 
Basically, the use-case is to load and index all the XML ZIP files 
first, and then check the RSS feed every two hours and update the 
indexes with any new ones.


Regards,

Joe

Re: SolrCloud timing out marking node as down during startup.

Hi Mike,

This is a bug which was fixed in Solr 4.10.3 via
http://issues.apache.org/jira/browse/SOLR-6610 and it slows down cluster
restarts. Since you have a single node cluster, you will run into it on
every restart.

On Thu, Jan 22, 2015 at 6:42 PM, Michael Roberts 
wrote:

> Hi,
>
> I'm seeing some odd behavior that I am hoping someone could explain to me.
>
> The configuration I'm using to repro the issue, has a ZK cluster and a
> single Solr instance. The instance has 10 Cores, and none of the cores are
> sharded.
>
> The initial startup is fine, the Solr instance comes up and we build our
> index. However if the Solr instance exits uncleanly (killed rather than
> sent a SIGINT), the next time it starts I see the following in the logs.
>
> 2015-01-22 09:56:23.236 -0800 (,,,) localhost-startStop-1 : INFO
> org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from
> ZooKeeper...
> 2015-01-22 09:56:30.008 -0800 (,,,) localhost-startStop-1-EventThread :
> DEBUG org.apache.solr.common.cloud.SolrZkClient - Submitting job to respond
> to event WatchedEvent state:SyncConnected type:NodeChildrenChanged
> path:/live_nodes
> 2015-01-22 09:56:30.008 -0800 (,,,) zkCallback-2-thread-1 : DEBUG
> org.apache.solr.common.cloud.ZkStateReader - Updating live nodes... (0)
> 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : WARN
> org.apache.solr.cloud.ZkController - Timed out waiting to see all nodes
> published as DOWN in our cluster state.
> 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : INFO
> org.apache.solr.cloud.ZkController - Register node as live in
> ZooKeeper:/live_nodes/10.18.8.113:11000_solr
> My question is about "Timed out waiting to see all nodes published as DOWN
> in our cluster state."
>
> Cursory look at the code, we seem to iterate through all
> Collections/Shards, and mark the state as Down. These notifications are
> offered to the Overseer, who I believe updates the ZK state. We then wait
> for the ZK state to update, with the 60 second timeout.
>
> However, it looks like the Overseer is not started until after we wait for
> the timeout. So, in a single instance scenario we'll always have to wait
> for the timeout.
>
> Is this the expected behavior (and just a side effect of running a single
> instance in cloud mode), or is my understanding of the Overseer/Zk
> relationhip incorrect?
>
> Thanks.
>
> .Mike
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-23 Thread Walter Underwood

It isn’t that complicated. You need to understand URL escaping for working with 
any REST client. As soon as you need to read the logs, you’ll need to 
understand it.

The double quote becomes %22 and the colon becomes %3A. In a parameter, the 
spaces can be +, but in a path they need to be %20. 

http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary%3A%22Oracle+Fusion%22

wunder

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Jan 23, 2015, at 7:08 AM, Carl Roberts  wrote:

> Thanks Erick,
> 
> I think I am going to start using the browser for testing...:) Perhaps also a 
> REST client for the Mac.
> 
> Regards,
> 
> Joe
> 
> On 1/22/15, 6:56 PM, Erick Erickson wrote:
>> Have you considered using the admin/query form? Lots of escaping is done
>> there for you. Once you have the form of the query down and know what to
>> expect, it's probably easier to enter "escaping hell" with curl and the
>> like
>> 
>> And what is your schema definition for the field in question? the
>> admin/analysis page can help a lot here.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Jan 22, 2015 at 3:51 PM, Carl Roberts >> wrote:
>>> Thanks Shawn - I tried this but it does not work.  I don't even get a
>>> response from curl when I try that format and when I look at the logging on
>>> the console for Jetty I don't see anything new - it seems that the request
>>> is not even making it to the server.
>>> 
>>> 
>>> 
>>> On 1/22/15, 6:43 PM, Shawn Heisey wrote:
>>> 
 On 1/22/2015 4:31 PM, Carl Roberts wrote:
 
> Hi Walter,
> 
> If I try this from my Mac shell:
> 
>  curl
> http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary
> :"Oracle
> Fusion"
> 
> I don't get a response.
> 
 Quotes are a special character to the shell on your mac, and get removed
 from what the curl command sees.  You'll need to put the whole thing in
 quotes (so that characters like & are not interpreted by the shell) and
 then escape the quotes that you want to actually be handled by curl:
 
 curl
 "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary
 :\"Oracle
 Fusion\""
 
 Thanks,
 Shawn
 
 
>

Re: Is Solr a good candidate to index 100s of nodes in one XML file?

I got the RSS DIH example to work with my own RSS feed and it works 
great - thanks for the help.


On 1/22/15, 11:20 AM, Carl Roberts wrote:

Thanks. I am looking at the RSS DIH example right now.


On 1/21/15, 3:15 PM, Alexandre Rafalovitch wrote:

Solr is just fine for this.

It even ships with an example of how to read an RSS file under the DIH
directory. DIH is also most likely what you will use for the first
implementation. Don't need to worry about Stax or anything, unless
your file format is very weird or has overlapping namespaces (DIH XML
parser does not care about namespaces).

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 21 January 2015 at 14:53, Carl Roberts 
 wrote:

Hi,

Is Solr a good candidate to index 100s of nodes in one XML file?

I have an RSS feed XML file that has 100s of nodes with several 
elements in
each node that I have to index, so I was planning to parse the XML 
with Stax
and extract the data from each node and add it to Solr.  There will 
always
be only one one file to start with and then a second file as the RSS 
feeds

supplies updates.  I want to return certain fields of each node when I
search certain fields of the same node.  Is Solr overkill in this case?
Should I just use Lucene instead?

Regards,

Joe

Re: How do you query a sentence composed of multiple words in a description field?


Thanks Erick,

I think I am going to start using the browser for testing...:) Perhaps 
also a REST client for the Mac.


Regards,

Joe

On 1/22/15, 6:56 PM, Erick Erickson wrote:

Have you considered using the admin/query form? Lots of escaping is done
there for you. Once you have the form of the query down and know what to
expect, it's probably easier to enter "escaping hell" with curl and the
like

And what is your schema definition for the field in question? the
admin/analysis page can help a lot here.

Best,
Erick

On Thu, Jan 22, 2015 at 3:51 PM, Carl Roberts 
wrote:
Thanks Shawn - I tried this but it does not work.  I don't even get a
response from curl when I try that format and when I look at the logging on
the console for Jetty I don't see anything new - it seems that the request
is not even making it to the server.



On 1/22/15, 6:43 PM, Shawn Heisey wrote:


On 1/22/2015 4:31 PM, Carl Roberts wrote:


Hi Walter,

If I try this from my Mac shell:

  curl
http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary
:"Oracle
Fusion"

I don't get a response.


Quotes are a special character to the shell on your mac, and get removed
from what the curl command sees.  You'll need to put the whole thing in
quotes (so that characters like & are not interpreted by the shell) and
then escape the quotes that you want to actually be handled by curl:

curl
"http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary
:\"Oracle
Fusion\""

Thanks,
Shawn

solr 4.7 Converting from one boost method to another using ExternalFileField

2015-01-23 Thread Parnit Pooni

Hi,

I'm currently running into issues creating a solr query to try and boost on
two ExternalFileFields. The following query seems to work, but is extremely
long and repeats query terms and does not use what I would like to use.


http://localhost/solr/Index/select?fl=field(externalFileField1),field(externalFileField2),score&q={!boost%20b=map(field(externalFileField1),5,15,10,3)}term+{!boost%20b=map(field(externalFileField2),70,90,25,1)}term


I would like to use bq instead of {!boost b=myBoostFunction()} format as
you can see I am repeating the term again according to the following
tutorial the formats should be compatible.
http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/

sample query


http://localhost/solr/Index/select?q=*&fl=field(externalFileField1),_val1_:map(field(externalFileField2),5,15,10,3),_val2_:map(field(externalFileField2),70,90,25,1),field(externalFileField2),score&bq=_val_:map(field(externalFileField1),5,15,10,3)%20_val1_:map(field(externalFileField2),70,90,25,1)

This does not seem to work and returns documents ranked by the primary keys.

I have tried to make multiple queries using bq using a external file field
and boosting does not seem to work. An additional requirement I have, is to
boost on specific range for the ExternalFileField and the map function
helps achieve this.
Any help is greatly appreciated.

Regards,
Parnit

Re: Using tmpfs for Solr index

On 1/23/2015 2:40 AM, Toke Eskildsen wrote:
> If you have a single index on a box with enough memory to fully cache
> the index data, I would recommend just using MMapDirectory without
> involving tmpfs.

If it's Solr 4.x, I have pretty much the same advice, with one small
change.  I would actually use the default directory in Solr, which is
NRTCachingDirectoryFactory.  This is a wrapper directory implementation
that provides a small amount of memory-based caching on top of another
implementation.  On 64-bit Java, the wrapped implementation will be
MMapDirectory.  Turning on updateLog is highly recommended with the NRT
implementation.

Thanks,
shawn

Re: trying to get Apache Solr working with Dovecot.

On 1/23/2015 12:11 AM, Kevin Laurie wrote:
> The solr / lucene version is 4.10.2
> 
> I am trying to figure out how to see if Dovecot and Solr can contact.
> Apparently when I make searches there seems to be no contact. I might try
> to rebuild dovecot again and see if that solves the problem.
> 
> I just checked var/log/solr and its empty. Might need to enable debugging
> on Solr.
> 
> Regarding tracing, not much as I am still relatively new(might be a
> challenge) but will figure out.
> 
> Is there any well documented manual for dovecot-solr integration?

Very likely you'll need to talk to whoever made the Solr plugin for
dovecot.  If they look at your situation and tell you that the problem
is in Solr itself and they don't know what to do, then you can come back
here with the specific logs and information that they point to.

Solr, as shipped by Apache, defaults to INFO level logging (which is
very verbose), into a file named logs/solr.log.  If the solr log is in
/var/log, then I can tell you already that either you're not using Solr
directly from Apache, or someone has changed the config.  If the Solr is
packaged by someone else, they will have information that we don't, and
they'll be better situated to help you.  If it's Solr from Apache but
the config has been modified, then we need to know what modifications
were made.

If I do a google search for "dovecot solr" (without the quotes), the
very first hit that comes up looks like it's completely relevant.  The
links at the end of that page are not very helpful -- one requires
authentication and the other talks about Solr 1.4.0, which is five years
old.

http://wiki2.dovecot.org/Plugins/FTS/Solr

Thanks,
Shawn

Re: Suggester Example In Documentation Not Working

2015-01-23 Thread Charles Sanders

Well, I'm running LucidWorks 2.9.1 which contains Solr 4.8. 

I initially was working with the Solr documentation: 
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.8.pdf
 

However, you will notice on page 228, under the section "Suggester", it gives 
an example of a suggester search component using solr.SpellCheckComponet. 

Then our support rep at LucidWorks said we should use the documentation found 
here: 
https://cwiki.apache.org/confluence/display/solr/Suggester 

This documentation is for Solr 5.0, however, you will notice the statement: 
"Solr has long had the autosuggest functionality, but Solr 4.7 introduced a new 
approach based on a dedicated SuggestComponent . " 

So it would appear the solr.SuggestComponent has been around since 4.7, but the 
documentation has not caught up with the changes. Which is the source of a 
little confusion. 

Nevertheless, I had hoped to find a simple working example that I could use as 
a starting point to get the solr.SuggestComponent working so that I might play 
around with it and make it do what I want. The suggester appears to have many 
parameters and options, of which, several contain little or no explanation. 

No problem. I will just have to invest a little more time to unravel how the 
component works and how I can best use it. 

Thanks for your reply. 


- Original Message -

From: "Chris Hostetter"  
To: solr-user@lucene.apache.org 
Sent: Thursday, January 22, 2015 12:50:46 PM 
Subject: Re: Suggester Example In Documentation Not Working 


1) which version of Solr are you using? (note that the online HTML ref 
guide is a DRARFT that applies to 5.0 - you may want to review the 
specific released version of the ref guide that applies to your version of 
solr: http://archive.apache.org/dist/lucene/solr/ref-guide/ 

2) the behavior of the suggester is very specific to the contents of the 
dictionary built -- the examples on that page apply to the example docs 
included with solr -- hence the techproduct data, and the example queries 
for input like "elec" suggesting "electronics" 

no where on that page is an example using the query "kern" -- wether or 
not that input would return a suggestion is going to be entirely dependent 
on wether the dictionary you built contains any similar terms to suggest. 

if you can please post more details about your documents -- ideally a full 
set of all the documents in your index (using a small test index of 
course) that may help to understand the results you are getting. 



: Date: Thu, 22 Jan 2015 11:14:43 -0500 (EST) 
: From: Charles Sanders  
: Reply-To: solr-user@lucene.apache.org 
: To: solr-user@lucene.apache.org 
: Subject: Suggester Example In Documentation Not Working 
: 
: Attempting to follow the documentation found here: 
: https://cwiki.apache.org/confluence/display/solr/Suggester 
: 
: The example given in the documentation is not working. See below my 
configuration. I only changed the field names to those in my schema. Can anyone 
provide an example for this component that actually works? 
: 
:  
:  
: mySuggester 
: FuzzyLookupFactory 
: DocumentDictionaryFactory 
: sugg_allText 
: suggestWeight 
: string 
:  
:  
: 
:  
:  
: true 
: 10 
: true 
:  
:  
: suggest 
:  
:  
: 
:  
:  
: 
: 
: 
http://localhost:/solr/collection1/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=kern
 
: 
: 
{"responseHeader":{"status":0,"QTime":4},"command":"build","suggest":{"mySuggester":{"kern":{"numFound":0,"suggestions":[]
 
: 

-Hoss 
http://www.lucidworks.com/

Re: In a SolrCloud, will a solr core(shard replica) failover to its good peer when its state is not Active