Re: # open files with SolrCloud

2012-04-22 Thread Sami Siren
On Sat, Apr 21, 2012 at 9:57 PM, Yonik Seeley
 wrote:
> I can reproduce some kind of searcher leak issue here, even w/o
> SolrCloud, and I've opened
> https://issues.apache.org/jira/browse/SOLR-3392

With the fix integrated. I do not see the leaking problem anymore with
my setup so it seems to be working now.

--
 Sami Siren


Exception fixing docBase for context [error in opening zip file]

2012-04-22 Thread Yung-chung Lin
Hi,

I am experiencing a problem starting solr with Tomcat 6.

My system:  Ubuntu 11.

ii  tomcat66.0.32-5ubuntu1.2
Servlet and JSP engine
ii  openjdk-6-jre  6b23~pre11-0ubuntu1.11.10.2
OpenJDK Java runtime, using Hotspot JIT

I'm using the nightly build war
file: apache-solr-4.0-2012-04-21_08-25-44.war

Can anyone give me a pointer? Thanks.

Below is the error message I got.

2012/4/23 下午 02:24:42 org.apache.coyote.http11.Http11Protocol init
資訊: Initializing Coyote HTTP/1.1 on http-8080
2012/4/23 下午 02:24:42 org.apache.catalina.startup.Catalina load
資訊: Initialization processed in 575 ms
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardService start
資訊: Starting service Catalina
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardEngine start
資訊: Starting Servlet Engine: Apache Tomcat/6.0.32
2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
deployDescriptor
資訊: Deploying configuration descriptor ROOT.xml
2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig
deployDescriptor
資訊: Deploying configuration descriptor solr.xml
2012/4/23 下午 02:24:42 org.apache.catalina.startup.ContextConfig init
嚴重的: Exception fixing docBase for context [/solr]
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
 at java.util.zip.ZipFile.(ZipFile.java:131)
at java.util.jar.JarFile.(JarFile.java:150)
 at java.util.jar.JarFile.(JarFile.java:87)
at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:90)
 at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:66)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:86)
 at
sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
at
sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
 at org.apache.catalina.startup.ExpandWar.expand(ExpandWar.java:148)
at
org.apache.catalina.startup.ContextConfig.fixDocBase(ContextConfig.java:886)
 at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:1021)
at
org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279)
 at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.StandardContext.init(StandardContext.java:5707)
 at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4449)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
 at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315)
 at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
 at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardContext
resourcesStart
嚴重的: Error starting static Resources
java.lang.IllegalArgumentException: Invalid or unreadable WAR file :
/home/yclin/Projects/search/search/solr/wars/apache-solr-4.0-2012-04-21_08-25-44.war
at
org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:130)
 at
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4320)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4489)
 at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
 at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
at org.apache.catalina.startup.HostConfig.de

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-22 Thread pcrao
Hi Mikhail Khludnev, 

THank you for your help.

Let me explain you the scenario about JVM.
The JVM in which tomcat is running will not be restarted every time the
StreamingUpdateSolrServer
is running where as the EmbeddedSolrServer is a fresh JVM instance(new
process) every time.
In this scenario the index is being corrupted.

If I restart Tomcat(i.e. restart JVM in which StreamingupdateServer is
running) after each of the index
completion the index doesn't get corrupted. However, this is not a viable
option for us because Solr will
not be available to users during the restart.

Let me know if you have any more thoughts on this.
In case you dont, can you also let me know how can I seek help from others?

Thanks again,
PC Rao.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3931636.html
Sent from the Solr - User mailing list archive at Nabble.com.


The index speed in the solr

2012-04-22 Thread neosky
It takes me 50 hours to index a total 9 G file(about 2,000,000 documents)
with n-gram filter from min=6,max=10, my token before ngram filter is
long(not a word, at most 300,000 bytes with white space). I split into 4
files and use the post.sh to update at the same time. I also tried to write
a lucene to do the index myself(single thread). The time is almost the same.
I would like to know what's the general bottleneck for the index in solr?
Doesn't the solr handle the index update request concurrently?

1.
Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update
  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
 51 3005M0 0   51 1557M  0  18902 46:19:14 23:59:46 22:19:28
0
2.
Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update
  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
 62 2623M0 0   62 1632M  0  19839 38:31:16 23:58:01 14:33:15
76629
3.
Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update
  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
 65 2667M0 0   65 1737M  0  21113 36:48:23 23:58:06 12:50:17
25537
4.
Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update
  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
 58 2766M0 0   58 1625M  0  19752 40:47:34 23:58:28 16:49:06
81435


--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can I get the top term in solr?

2012-04-22 Thread neosky
You are very helpful. Thanks a lot!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-I-get-the-top-term-in-solr-tp3926536p3931252.html
Sent from the Solr - User mailing list archive at Nabble.com.


'Error 404: missing core name in path' in Solr

2012-04-22 Thread vasuj
I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png 
used

//server.deleteByQuery( "*:*" );// CAUTION: deletes everything!
query in my solr indexing program. Since then i am receiving the error
whenever , i go to

http://localhost:8080/solr/admin/

and press search with query string :

The error is

HTTP Status 400 - Missing solr core name in path

type Status report

message Missing solr core name in path

description The request sent by the client was syntactically incorrect
(Missing solr core name in path).

Apache Tomcat/7.0.21

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html
Sent from the Solr - User mailing list archive at Nabble.com.


searcher leak on trunk after 2/1/2012

2012-04-22 Thread Yonik Seeley
Folks,
If you're using a trunk version after 2/1/2012 in conjunction with the
shipped solrconfig.xml (which uses openSearcher=false in an autoCommit
by default),
then you should upgrade to a new version.  There's a searcher leak
when openSearcher=false is used with a commit that leads to files not
being closed.

This was just fixed in https://issues.apache.org/jira/browse/SOLR-3392
so if you're looking to use nightly builds, you will need one from Apr
23 or later.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


RE: Crawling an SCM to update a Solr index

2012-04-22 Thread Van Tassell, Kristian
Otis,

Thanks for the input! Were it not the metadata I need to extract and the slight 
possibility a sync error/file system error or inconsistency could occur, I 
would take that same route. 

-Kristian

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, April 20, 2012 10:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Crawling an SCM to update a Solr index

Kristian,

For what it's worth, for http://search-lucene.com and http://search-hadoop.com 
we simply check out the source code from the SCM and index from the file 
system.  It works reasonably well.  The only issues that I can recall us having 
is with the source code organization under SCM - modules get moved around and 
sometimes this requires us to update stuff on our end to match those changes.

Otis

Performance Monitoring for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>
> From: "Van Tassell, Kristian" 
>To: "solr-user@lucene.apache.org"  
>Sent: Friday, April 20, 2012 3:26 PM
>Subject: Crawling an SCM to update a Solr index
> 
>Hello everyone,
>
>I'm in the process of pulling together requirements for a SCM (source code 
>manager) crawling mechanism for our Solr index. I probably don't need to argue 
>the need for a crawler, but to be specific, we have an index which receives 
>its updates from a custom built application. I would, however, like to 
>periodically crawl the SCM to ensure the index is up to date. In addition, if 
>updates are made which require a complete reindex (such as schema.xml 
>modifications), I could utilize this crawler to update everything or specific 
>areas.
>
>I'm wondering if there are any initiatives, tools (like Nutch) or whitepapers 
>out there, which crawl an SCM. More specifically, I'm looking for a Perforce 
>solution. I'm guessing that there is nothing specific and I'm prepared to 
>design to our specific requirements, but wanted to check with the Solr 
>community prior to getting too far in.
>
>I'm most likely going to build the solution to interact with the SCM directly 
>(via their API) versus sync'ing the SCM repository to the filesystem and crawl 
>that way, since there could be filesystem problem syncing the data and because 
>there may be relevant metadata information that can be retrieved from the SCM.
>
>Thanks in advance for any information you may have,
>Kristian
>
>
>


Re: SolrCloud: Programmatically create multiple collections?

2012-04-22 Thread Mark Miller
Hey Ravi - yeah, I know this is kind of confusing. The issue is that the true 
state is actually the advertised state in clusterstate.json *and* whether or 
not a node is listed on live_nodes.

The reason this is the case is that if a node just dies, it may have left its 
current in *any* state. The way we know that its no longer connected to 
zookeeper is looking at live_nodes - which are ephemeral and will go away if a 
node goes away.

I've discussed with Sami in the past the idea of the Overseer perhaps taking a 
look when a node goes down and updating it's state in clusterstate.json - not 
sure if there are some gotchyas with that or not though. Right now only a node 
updates its own state and the Overseer reads those states and compiles them 
into clusterstate.json.

In any case, it's not a bug, its expected - but at the same time it might be 
nice if things worked a little nicer if possible. 

On Apr 19, 2012, at 7:11 PM, ravi wrote:

> Hi Mark, 
> 
> Thanks for your response. I did manage to one example running with 2 solr
> instance running and i checked that shards are created and replicated
> properly. 
> 
> The problem that i am now facing is zookeeper's clusterstate. If i kill one
> solr instance (which may hold one or more cores) by pressing CTRL+C,
> zookeeper never show's that instance as *down* and keeps on sowing that
> instance as *active*.
> 
> The other instance, becomes the leader for some of the shards that were
> present in the first instance though. This suggests that zookeeper gets to
> know that one instance went down but for some strange reason its not
> updating clusterstate.json thing. 
> 
> Has this already been reported? or there is something that i am missing? 
> 
> Thanks!
> Ravi
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p3924698.html
> Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com













Re: Large Index and OutOfMemoryError: Map failed

2012-04-22 Thread Michael McCandless
Is it possible you are hitting this (just opened) Solr issue?:

https://issues.apache.org/jira/browse/SOLR-3392

Mike McCandless

http://blog.mikemccandless.com

On Fri, Apr 20, 2012 at 9:33 AM, Gopal Patwa  wrote:
> We cannot avoid auto soft commit, since we need Lucene NRT feature. And I
> use StreamingUpdateSolrServer for adding/updating index.
>
> On Thu, Apr 19, 2012 at 7:42 AM, Boon Low  wrote:
>
>> Hi,
>>
>> Also came across this error recently, while indexing with > 10 DIH
>> processes in parallel + default index setting. The JVM grinds to a halt and
>> throws this error. Checking the index of a core reveals thousands of files!
>> Tuning the default autocommit from 15000ms to 90ms solved the problem
>> for us. (no 'autosoftcommit').
>>
>> Boon
>>
>> -
>> Boon Low
>> Search UX and Engine Developer
>> brightsolid Online Publishing
>>
>> On 14 Apr 2012, at 17:40, Gopal Patwa wrote:
>>
>> > I checked it was "MMapDirectory.UNMAP_SUPPORTED=true" and below are my
>> > system data. Is their any existing test case to reproduce this issue? I
>> am
>> > trying understand how I can reproduce this issue with unit/integration
>> test
>> >
>> > I will try recent solr trunk build too,  if it is some bug in solr or
>> > lucene keeping old searcher open then how to reproduce it?
>> >
>> > SYSTEM DATA
>> > ===
>> > PROCESSOR: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
>> > SYSTEM ID: x86_64
>> > CURRENT CPU SPEED: 1600.000 MHz
>> > CPUS: 8 processor(s)
>> > MEMORY: 49449296 kB
>> > DISTRIBUTION: CentOS release 5.3 (Final)
>> > KERNEL NAME: 2.6.18-128.el5
>> > UPTIME: up 71 days
>> > LOAD AVERAGE: 1.42, 1.45, 1.53
>> > JBOSS Version: Implementation-Version: 4.2.2.GA (build:
>> > SVNTag=JBoss_4_2_2_GA date=20
>> > JAVA Version: java version "1.6.0_24"
>> >
>> >
>> > On Thu, Apr 12, 2012 at 3:07 AM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> Your largest index has 66 segments (690 files) ... biggish but not
>> >> insane.  With 64K maps you should be able to have ~47 searchers open
>> >> on each core.
>> >>
>> >> Enabling compound file format (not the opposite!) will mean fewer maps
>> >> ... ie should improve this situation.
>> >>
>> >> I don't understand why Solr defaults to compound file off... that
>> >> seems dangerous.
>> >>
>> >> Really we need a Solr dev here... to answer "how long is a stale
>> >> searcher kept open".  Is it somehow possible 46 old searchers are
>> >> being left open...?
>> >>
>> >> I don't see any other reason why you'd run out of maps.  Hmm, unless
>> >> MMapDirectory didn't think it could safely invoke unmap in your JVM.
>> >> Which exact JVM are you using?  If you can print the
>> >> MMapDirectory.UNMAP_SUPPORTED constant, we'd know for sure.
>> >>
>> >> Yes, switching away from MMapDir will sidestep the "too many maps"
>> >> issue, however, 1) MMapDir has better perf than NIOFSDir, and 2) if
>> >> there really is a leak here (Solr not closing the old searchers or a
>> >> Lucene bug or something...) then you'll eventually run out of file
>> >> descriptors (ie, same  problem, different manifestation).
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> 2012/4/11 Gopal Patwa :
>> >>>
>> >>> I have not change the mergefactor, it was 10. Compound index file is
>> >> disable
>> >>> in my config but I read from below post, that some one had similar
>> issue
>> >> and
>> >>> it was resolved by switching from compound index file format to
>> >> non-compound
>> >>> index file.
>> >>>
>> >>> and some folks resolved by "changing lucene code to disable
>> >> MMapDirectory."
>> >>> Is this best practice to do, if so is this can be done in
>> configuration?
>> >>>
>> >>>
>> >>
>> http://lucene.472066.n3.nabble.com/MMapDirectory-failed-to-map-a-23G-compound-index-segment-td3317208.html
>> >>>
>> >>> I have index document of core1 = 5 million, core2=8million and
>> >>> core3=3million and all index are hosted in single Solr instance
>> >>>
>> >>> I am going to use Solr for our site StubHub.com, see attached "ls -l"
>> >> list
>> >>> of index files for all core
>> >>>
>> >>> SolrConfig.xml:
>> >>>
>> >>>
>> >>>      
>> >>>              false
>> >>>              10
>> >>>              2147483647
>> >>>              1
>> >>>              4096
>> >>>              10
>> >>>              1000
>> >>>              1
>> >>>              single
>> >>>
>> >>>          > class="org.apache.lucene.index.TieredMergePolicy">
>> >>>            0.0
>> >>>            10.0
>> >>>          
>> >>>
>> >>>          
>> >>>            false
>> >>>            0
>> >>>          
>> >>>
>> >>>      
>> >>>
>> >>>
>> >>>      
>> >>>          1000
>> >>>           
>> >>>             90
>> >>>             false
>> >>>           
>> >>>           
>> >>>
>> >> ${inventory.solr.softcommit.duration:1000}
>> >>>           
>> >>>
>> >>>      
>> >>>
>> >>>
>> >>> Forwarded conversation
>> >>> Subject: Large Index and OutOfMemoryError: Map failed
>> >>> --

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-22 Thread Mikhail Khludnev
To be honest I have no idea. Can you try to shutdown the first process JVM
after it's complete indexing and start second JVM only after that. Whether
it work?
which version of Solr you are running?

On Fri, Apr 20, 2012 at 8:14 AM, pcrao  wrote:

> Hi,
>
> Any update?
> Thanks,
> PC Rao
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3925014.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru


 


Re: Solr Indexing error in this function

2012-04-22 Thread vasuj
yes it worked. Thanks Gora. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-error-in-this-function-tp3929446p3929673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing error in this function

2012-04-22 Thread Gora Mohanty
On 22 April 2012 15:33, vasuj  wrote:
> Log is :
>
>
> Apr 22, 2012 2:55:17 AM org.apache.solr.update.processor.LogUpdateProcessor
> finish
> INFO: {add=[(null)]} 0 17
> Apr 22, 2012 2:55:17 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
> field: id
[...]

Not very conversant with the ExtractingRequestHandler, but the
error message above seems pretty clear. You need to also supply
an ID field, which is presumably the document ID that Solr needs.

Regards,
Gora


Re: How can I get the top term in solr?

2012-04-22 Thread Dan Tuffery
1) The TermsComponent will return the top terms:

http://wiki.apache.org/solr/TermsComponent

2) Add 'debugQuery=on' to your query, look at the 'explain' section in the
results to get information regarding how many times the term appears in the
document (idf).

On Fri, Apr 20, 2012 at 5:31 PM, neosky  wrote:

> Actually I would like to know two meaning of the top term in document level
> and index file level.
> 1.The top term in document level means that I would like to know the top
> term frequency in all document(only calculate once in one document)
> The solr schema.jsp seems to provide to  top 10 term, but it only works in
> small index set. When the index gets large, it is hardly to get the result.
> Suppose I want to use the Solrj to get the top 20 term, What should I do?
> I have reviewed the schema.jsp, but I have no idea how they do this.
>
> 2.Another is that I also would like to know how many times of the a
> specific
> term appear in the index. I would like to know the total number=
> sum(document*appear times in this document)
>
> Any idea will be appreciated.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-can-I-get-the-top-term-in-solr-tp3926536p3926536.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Indexing error in this function

2012-04-22 Thread vasuj
Log is :


Apr 22, 2012 2:55:17 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[(null)]} 0 17
Apr 22, 2012 2:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:185)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:151)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:269)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Apr 22, 2012 2:55:17 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params={waitSearcher=true&commit=true&Latitude=51.9125&Longitude=179.5&wt=javabin&waitFlush=true&version=2}
status=400 QTime=17 
Apr 22, 2012 2:55:17 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[(null)]} 0 24
Apr 22, 2012 2:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:185)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostVal

Re: Storing the md5 hash of pdf files as a field in the index

2012-04-22 Thread kuchenbrett
Hi Lance,

 sounds interesting. The idea was to use a message digest (e. g. a md5 hash) of 
a file to be indexed as an unique identifier to avoid duplicates. I wasn't 
aware of the de-duplication feature you mention. This feature seems to be the 
exact solution for my problem. In the solr wiki I found some samples how to 
configure and trigger it when calling a XmlUpdateRequestHandler. I guess I can 
also use it in a similar way when calling the DataImportHandler, correct?

Many thanks for your suggestion.
 Joe

 > The SignatureUpdateProcessor implements a smaller, faster cryptohash. > It 
 > is used by the de-duplication feature. > > What's the purpose? Do you need 
 > the MD5 algorithm, or is any competent > cryptohash good enough? > > On Sat, 
 > Apr 21, 2012 at 5:55 AM,  wrote: > > Hi Otis, > > > > 
 > thank you very much for the quick response to my question. I'll have a look 
 > at your > suggested solution. Do you know if there's any documentation about 
 > writing such an Update > Request Handler or how to trigger it using the Data 
 > Import/Tika combination? > > > > Thanks. > > Joe


Solr Indexing error in this function

2012-04-22 Thread vasuj
Solr Indexing error in this function. I am using Windows 8 x32, Xampp to
configure solr, tomcat. I have tried many other forums too but not helpful.
Even tried configuring many XML in Xampp/solr still could not get it
working. Any hints would be helpful. Here is my function for solr indexing
and the imports

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.SolrQuery;
`

public void GeoTagIndexToSolr(File f,String Latitude,String Longitude){
try {
String urlString = "http://localhost:8080/solr";; 
SolrServer server = new CommonsHttpSolrServer(urlString);
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
String fileName=f.toString();
up.addFile(new File(fileName));
up.setParam("Latitude", Latitude);
up.setParam("Longitude", Longitude);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
} catch (SolrServerException e) {
System.out.println("SolrServerException: "+e+"
");e.printStackTrace();
} catch (MalformedURLException e) {
System.out.println("MalformedURLException: "+e+"
");e.printStackTrace();
} catch (IOException e) {
System.out.println("IOException: "+e+" ");e.printStackTrace();
}   
}

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-error-in-this-function-tp3929446p3929446.html
Sent from the Solr - User mailing list archive at Nabble.com.