from:"sweety"

to reduce indexing time

2014-03-05 Thread sweety

Before indexing , this was the memory layout,

System Memory : 63.2% ,2.21 gb
JVM Memory : 8.3% , 81.60mb of 981.38mb

I have indexed 700 documents of total size 12MB.
Following are the results i get : 
Qtime: 8122, System time : 00:00:12.7318648
System Memory : 65.4% ,2.29 gb
JVM Memory : 15.3% , 148.32mb of 981.38mb

After indexing 7,000 documents,
Qtime: 51817, System time : 00:01:12.6028320
System Memory : 69.4% 2.43Gb
JVM Memoery : *26.5%* , 266.60mb

After indexing 70,000 documents of 1200mb size, this are the results :
Qtime: 511447, System time : 00:11:14.0398768
System memory : 82.7% , 2.89Gb
JVM memory :* 11.8%* , 118.46mb

Here the JVM usage decreases as compared to 7000 doc, why is it  so?? 

This is* solrconfig.xml *;
updateHandler class=solr.DirectUpdateHandler2
updateLog
str name=dir${solr.document.log.dir:}/str
/updateLog
   autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit
autoCommit
maxTime60/maxTime 
openSearchertrue/openSearcher
/autoCommit
/updateHandler
 
I am indexing through solrnet, indexing each document,  var res =
solr.Add(doc). // Doc doc = new Doc();

How do i reduce the time for indexing, as the size of data indexed is quite
less?? Will batch indexing reduce the indexing time?? But then, do i need to
make changes in solrconfig.xml
Also, i want the documents to be searched in 1sec of indexing.
Is it true that, if softcommit is done, then faceting cannot be done on the
data??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to reduce indexing time

2014-03-05 Thread sweety

Now i have batch indexed, with batch of 250 documents.These were the results.
After 7,000 documents,
Qtime: 46894, System time : 00:00:55.9384892
JVM memory : 249.02mb, 24.8%
This shows quite a reduction in timing.

After 70,000 documents,
Qtime: 480435, System time : 00:09:29.5206727 
System memory : 82.8%, 2.90gb
JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though
the timing has reduced.

After disabling softcommit and tlog, for 70,000 contracts.
Qtime: 461331, System time : 00:09:09.7930326
JVM Memory : 62.4% , 623.42mb. //Memory usage is less.

What causes this memory usage to  change, if the data to be indexed is same?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to reduce indexing time

2014-03-05 Thread sweety

I will surely read about JVM Garbage collection. Thanks a lot, all of you.

But, is the time required for my indexing good enough? I dont know about the
ideal timings.
I think that my indexing is taking more time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety

I have configured solrcloud as follows,
http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png 

Solr.xml:
solr persistent=true sharedLib=lib
  cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000}
hostPort=${jetty.port:} hostContext=solr
core loadOnStartup=true instanceDir=document\ transient=false
name=document/
core loadOnStartup=true instanceDir=contract\ transient=false
name=contract/
  /cores
/solr

I  have added all the required config for solrcloud, referred this :
http://wiki.apache.org/solr/SolrCloud#Required_Config

I am adding data to core:document.
Now when i try to index using solrnet, (solr.Add(doc)) , i get this error :
SEVERE: org.apache.solr.common.SolrException: *No registered leader was
found, collection:document* slice:shard2
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481)

and this error also:
SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed*
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520)

I guess, it is because the leader is from core:contract and i am trying to
index in core:document?
Is there a way to change the leader, and how ?
How can i change the state of shards from gone to active?

Also when i try to query : q=*:* , this is shown
org.apache.solr.common.SolrException: *Error opening new searcher at*
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at 

I read that, if number of commits exceed then this searcher error comes, but
i did not issue commit command,then how will the commit exceed. Also it
requires some warming setting, so i added this to solrconfig.xml, but still
i get the same error,

query
 listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
lst str name=qsolr/str
  str name=start0/str
  str name=rows10/str
/lst
lst str name=qrocks/str
  str name=start0/str
  str name=rows10/str
/lst
  /arr
/listener
maxWarmingSearchers2/maxWarmingSearchers
/query

I have just started with solrcloud, please tell if I am doing anything wrong
in solrcloud configurations.
Also i did not good material for solrcloud in windows 7 with apache tomcat ,
please suggest for that too.
Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety

How do i get them running?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

ya right all 3 points are right.
Let me solve the 1 first, there is some errror in tika level indexing, for
that i need to debug at tika level right??
 but how to do that?? Solr admin does not show package wise logging.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110922.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

through command line(java  -jar tika-app-1.4.jar -v C:Cloud.docx) apache
tika is able to parse .docx files,  so can i use this tika-app-1.4.jar in
solr?? how to do that??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110938.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

Sorry for the mistake.
im using solr 4.2, it has tika-1.3.
So now, java  -jar tika-app-1.3.jar -v C:Coding.pdf , parses pdf document
without error or msg.
Also, java  -jar tika-app-1.4.jar* -t *C:Cloud.docx, shows the entire
document.
Which means there is no problem in tika right??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

Sorry for the mistake. 
im using solr 4.2, it has tika-1.3. 
So now, java  -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document
without error or msg. 
Also, java  -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire
document. 
Which means there is no problem in tika right??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110954.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

Sorry for the mistake. 
im using solr 4.2, it has tika-1.3. 
So now, java  -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document
without error or msg. 
Also, java  -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire
document. 
Which means there is no problem in tika right??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110957.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-12 Thread sweety

I am working on Windows 7



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110993.html
Sent from the Solr - User mailing list archive at Nabble.com.

using extract handler: data not extracted

2014-01-11 Thread sweety

I need to index rich text documents, this is* solrconfig.xml for extract
handler*:
requestHandler name=/update/extract
class=solr.extraction.ExtractingRequestHandler 
lst name=defaults

str name=lowernamestrue/str
str name=uprefixignored_/str
str name=captureAttrtrue/str
/lst
/requestHandler

My *schema.xml* is:
field name=doc_id type=uuid indexed=true stored=true default=NEW
multiValued=false/
field name=id type=long indexed=true stored=true required=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=author type=title_text indexed=true stored=true
multiValued=true/
field name=title type=title_text indexed=true stored=true/
field name=date_modified type=date indexed=true stored=true
multivalued=true/
field name=_version_ type=long indexed=true stored=true
multiValued=false/
dynamicField name=ignored_* type=text indexed=true stored=true
multiValued=true/


But after *indexing using this curl*:
curl
http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
-Fmyfile=Coding.pdf
when queried as q=id:12, the *output* is :
arr name=ignored_stream_source_info
strmyfile/str
/arr
arr name=ignored_stream_content_type
strapplication/octet-stream/str
/arr
arr name=ignored_stream_size
str3336935/str
/arr
arr name=ignored_stream_name
strCoding.pdf/str
/arr
arr name=ignored_content_type
strapplication/pdf/str
/arr
str name=contents/str *Contents not shown*
long name=_version_1456831756526157824/long
str name=doc_id8eb229e0-5f25-4d26-bba4-6cb67aab7f81/str
/doc

Why is it so??

Also date_modified field does not appear??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread sweety

Sorry, that my question was not clear.
Initially when indexed pdf files it showed the data within this pdf in the
contents field.as follows:(this is output for initially indexed documents)
str name=contents
Cloud ctured As tale in size as well as complexity. We need a cloud based
system that will solve this problem.  Provide interfaces to registeP CSS
Client Measurements Benchmarkinse times by varying Number of documents
fromnds to millions Nuervers from 1 to 5 Storage and search options as
discussed abo
/str

But for newly indexed documents, the contents field is empty, 
Actually coding.pdf is of 3mb size, but as shown in the output the contents
of this pdf are not extracted, indexing extracts the metadata,but not the
contents of the file,
the contents field is empty, str name=contents/str  

what is the reason for this? Is is because of some jar missing?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110873.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread sweety

I set the level of extract handler to finest, now the logs are :
INFO: [document] webapp=/solr path=/update/extract
params={commit=trueliteral.id=12debug=true} {add=[12
(1456944038966984704)],commit=} 0 2631
Jan 11, 2014 7:51:57 PM org.apache.solr.servlet.SolrDispatchFilter
handleAdminRequest
INFO: [admin] webapp=null path=/admin/cores params={indexInfo=falsewt=json}
status=0 QTime=0 
Jan 11, 2014 7:51:57 PM org.apache.solr.core.SolrCore execute
INFO: [contract] webapp=/solr path=/admin/system params={wt=json} status=0
QTime=1 
Jan 11, 2014 7:51:58 PM org.apache.solr.core.SolrCore execute
INFO: [document] webapp=/solr path=/admin/mbeans params={stats=truewt=json}
status=0 QTime=3 

This shows no error.
Also in the curl query i have set debug=true.

What is the reason?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110877.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread sweety

how set finest for tika package??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread sweety

the logging screen does not show tika package, also i searched on net, it
requires log4j and slf4j jars, is it true?? Do i need to do the
configurations for package level log?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using extract handler: data not extracted

2014-01-11 Thread sweety

this is the output i get when indexed through* solrj*, i followed the link
you suggested.
i tried indexing .doc file.
response
lst name=responseHeader
int name=status400/int
int name=QTime17/int
/lst
lst name=error
str name=msg
org.apache.solr.search.SyntaxError: Cannot parse
'id:C:\solr\document\src\new_index_doc\document_1.doc': Encountered  : :
 at line 1, column 4. Was expecting one of: EOF AND ... OR ... NOT
... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ...
TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ...
[ ... { ... LPARAMS ... NUMBER ...
/str
int name=code400/int
/lst
/response

Also when indexed with *solrnet*, i get this error:
Caused by: java.lang.LinkageError: loader constraint violation: loader
(instance of org/apache/catalina/loader/WebappClassLoader) previously
initiated loading for a different type with name
org/apache/xmlbeans/XmlCursor
why this linkage error?? 

Now *curl does not work, neither does solrj and solrnet.*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110915.html
Sent from the Solr - User mailing list archive at Nabble.com.

to index byte array

2014-01-01 Thread sweety

I am converting .doc and .docx files to byte array in c#, now I need to index
this byte array of doc files.
Is it possible in solr to index byte array of files??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to index byte array

2014-01-01 Thread sweety

For indexing .docx files using tika, requires file system path, but i dont
want to give the path.

I read in DIH faq's that by using transformer the output can be converted
from byte to string.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109007.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to index byte array

2014-01-01 Thread sweety

For indexing .docx files using tika, requires file system path, but i dont
want to give the path.

I read in DIH faq's that by using transformer the output can be converted
from byte to string.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109008.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to index byte array

2014-01-01 Thread sweety

If you consider a client-server architecture, the documents will sent in
binary format to server, now for solr this binary format will be the source
to index, so i need to index byte array.
Also if store this byte-array into db and then index in solr, then will the
contents of document be searchable like normal documents(because the
contents are in binary format so will the solr match the query)??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109023.html
Sent from the Solr - User mailing list archive at Nabble.com.

indexing .docx using solrj

2013-12-21 Thread sweety

i am trying to index .docx file using solrj, i referred this link:
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

My code is :
import java.io.File;
import java.io.IOException;

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.*;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
public class rich_index {
 
 public static void main(String[] args) {
   try {
 //Solr cell can also index MS file (2003 version and 2007 
version)
types.
 String fileName = 
C:\\solr\\document\\src\\test1\\contract.docx; 
 //this will be unique Id used by Solr to index the file 
contents.
String solrId = contract.docx; 

indexFilesSolrCell(fileName, solrId);

  } catch (Exception ex) {
System.out.println(ex.toString());
  }
}
 
   public static void indexFilesSolrCell(String fileName, String 
solrId) 
   throws IOException, SolrServerException {
   
   String urlString = http://localhost:8080/solr/document;; 
   SolrServer solr = new HttpSolrServer(urlString);
   
   ContentStreamUpdateRequest up  = new
ContentStreamUpdateRequest(/update/extract);
   
   up.addFile(new File(fileName), text);
   
   
up.setParam(literal.id, solrId);
   up.setParam(uprefix, ignored_);
   up.setParam(fmap.content, contents);
   
   up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
   
   solr.request(up);
   
   QueryResponse rsp = solr.query(new SolrQuery(*:*));
   
   System.out.println(rsp);
 }  
}



This is my logs:
Dec 22, 2013 12:27:58 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [document] webapp=/solr path=/update/extract
params={fmap.content=contentswaitSearcher=truecommit=trueuprefix=ignored_literal.id=contract.docxwt=javabinversion=2softCommit=false}
{} 0 0
Dec 22, 2013 12:27:58 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
*org/apache/xml/serialize/BaseMarkupSerializer*
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)

To resolve this i added xerces.jar in the build path,this has.
org/apache/xml/serialize/BaseMarkupSerializer class,but the error is not
resolved.
What is the problem??


*Solrconfig:*
requestHandler name=/update/extract 
class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
str name=map.Last-Modifiedlast_modified/str
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str

/lst
/requestHandler

*scehma:*
fields 

field name=doc_id type=uuid indexed=true stored=true default=NEW
multiValued=false/
field name=id type=integer indexed=true stored=true required=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=author type=title_text indexed=true stored=true
multiValued=true/
field name=title type=title_text indexed=true stored=true/
field name=date_modified type=date

Re: indexing .docx using solrj

2013-12-21 Thread sweety

I have added that jar,in the build path.
but the same error,i get.
Why is eclipse not recognising that jar??

Logs also show this,
Caused by: java.lang.NoClassDefFoundError:
org/apache/xml/serialize/BaseMarkupSerializer
at 
org.apache.solr.handler.extraction.ExtractingRequestHandler.newLoader(ExtractingRequestHandler.java:117)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more
Caused by: java.lang.ClassNotFoundException:
org.apache.xml.serialize.BaseMarkupSerializer
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1688)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1533)
... 22 more





--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107746.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing .docx using solrj

2013-12-21 Thread sweety

Jar is already there in the lib folder of solr home.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107748.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: program termination in solrj

2013-12-21 Thread sweety

Before and after running client,stats remain same only,

class:org.apache.solr.update.DirectUpdateHandler2
version:1.0
description:Update handler that efficiently directly updates the on-disk
main lucene index
src:$URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
$

stats:
commits:0
autocommits:0
soft autocommits:0
optimizes:0
rollbacks:0
expungeDeletes:0
docsPending:0
adds:0
deletesById:0
deletesByQuery:0
errors:0
cumulative_adds:0
cumulative_deletesById:0
cumulative_deletesByQuery:0
cumulative_errors:0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107749.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing .docx using solrj

2013-12-21 Thread sweety

solr: 4.2
tomcat: 7.0
jdk1.7.0.45

i have created solr home in c:\solr as in java options: 
-Dsolr.solr.home=C:\solr

c:solr/lib contains:

tika jars, actually i pasted all the jars from the solr 4.2 dist,contrib
folders in c:solr/lib

tomcat/lib contains:
all the jars when installed.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107752.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: program termination in solrj

2013-12-21 Thread sweety

also my default search handler has no dismax.

requestHandler name=/select class=solr.SearchHandler
/requestHandler

 requestHandler name=standard class=solr.StandardRequestHandler
default=true

 lst name=defaults
   str name=echoParamsexplicit/str 
   int name=rows20/int
   str name=fl*/str
   str name=dfcontents/str
   str name=version2.1/str
 /lst
  /requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107753.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: program termination in solrj

2013-12-21 Thread sweety

okay, i did a mistake, i did not refresh the stats,so the stats after running
java program:

commits:1
autocommits:0
soft autocommits:0
optimizes:0
rollbacks:0
expungeDeletes:0
docsPending:0
adds:0
deletesById:0
deletesByQuery:0
errors:0
cumulative_adds:1
cumulative_deletesById:0
cumulative_deletesByQuery:0
cumulative_errors:0




--
View this message in context: 
http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107754.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing .docx using solrj

2013-12-21 Thread sweety

It is working now,i just restarted computer.
But i dont still get the reason for the error.
Thank you though,for your efforts.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107755.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing .docx using solrj

2013-12-21 Thread sweety

yes,i copied all jars from contrib/extraction to solr/lib.
It is not getting the poi jar now, as mentioned in above post of mine, new
error it shows now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107758.html
Sent from the Solr - User mailing list archive at Nabble.com.

Java heap space:out of memory

2013-12-10 Thread sweety

I just indexed 10 doc of total 15mb.For some queries it works fine but, 
for some queries i get this error:
response
lst name=error
str name=msgjava.lang.OutOfMemoryError: Java heap space/str
str name=trace
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at
java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError:
Java heap space
/str
int name=code500/int
/lst
/response

I have direclty indexed them into solr.
My schema.xml is:
fields 

field name=doc_id type=uuid indexed=true stored=true default=NEW
multiValued=false/
field name=id type=integer indexed=true stored=true required=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=author type=title_text indexed=true stored=true
multiValued=true/
field name=title type=title_text indexed=true stored=true/

field name=_version_ type=long indexed=true stored=true
multiValued=false/
copyfield source=id dest=text /
dynamicField name=ignored_* type=text indexed=false stored=false
multiValued=true/

field name=spelltext type=spell indexed=true stored=false
multiValued=true /
copyField source=contents dest=spelltext /
/fields 

I dont understand for such small num of  doc why do i get this error.
I havent studied much about solr performance details.
How to increase the heap size? because I need to index a lot more data
still.
Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

4gb ram.
I m running on Windows 7,with Tomcat as webserver. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

sorry but i dont know how to check that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105947.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

okay thanks,
here it is:
max heap size : 63.56MB(it is howing 37.2% usage though)
How to increase that size??




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

I have set : JAVA_OPTS as  value: -Xms1024M-Xmx1024M 
But the dashboard still shows 64M,but now the usage is only 18%
How could that be? yesterday it was 87%.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106069.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

yes,i did put the space,as in the image




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Java heap space:out of memory

2013-12-10 Thread sweety

You were right the changes made in JAVA_OPTs didn't show increase in the heap
size, I made changes in the UI of tomcat 
Initial pool memory : 512 MB
Maximum pool memory : 1024 MB

Now the heap size has increased.
Thanks you all for your suggestions,it really saved my time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106082.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Null pointer exception in spell checker at addchecker method

2013-12-09 Thread sweety

yes, it worked.
And i got the reason for the error.
Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489p4105636.html
Sent from the Solr - User mailing list archive at Nabble.com.

Null pointer exception in spell checker at addchecker method

2013-12-06 Thread sweety

Im trying to use spell check component.
My *schema* is:(i have included only fields necessary for spell check not
the entire schema)
fields 

field name=doc_id type=uuid indexed=true stored=true default=NEW
multiValued=false/
field name=id type=string indexed=true stored=true required=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=title type=text indexed=true stored=true/
field name=_version_ type=long indexed=true stored=true
multiValued=false/
copyfield source=id dest=text /
dynamicField name=ignored_* type=text indexed=false stored=false
multiValued=true/

field name=spelltext type=spell indexed=true stored=false
multiValued=true /
copyField source=contents dest=spelltext /
/fields 
types
fieldType name=spell class=solr.TextField 
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EnglishMinimalStemFilterFactory /
filter class=solr.SnowballPorterFilterFactory / 
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
splitOnCaseChange=1/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.EnglishMinimalStemFilterFactory /
filter class=solr.SnowballPorterFilterFactory / 
/analyzer
/fieldType
/types

My *solrconfig* is:

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetext/str
lst name=spellchecker
str name=namedirect/str
str name=fieldcontents/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.8/float
int name=maxEdits1/int
int name=minPrefix1/int
int name=maxInspections5/int
int name=minQueryLength3/int
float name=maxQueryFrequency0.01/float
/lst
/searchComponent  

searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldcontents/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges10/int
 /lst
/searchComponent  

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarydirect/str 
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str 
str name=spellcheckon/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatetrue/str 
str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

I get this *error*:
java.lang.NullPointerException at
org.apache.solr.spelling.*ConjunctionSolrSpellChecker.addChecker*(ConjunctionSolrSpellChecker.java:58)
at
org.apache.solr.handler.component.SpellCheckComponent.getSpellChecker(SpellCheckComponent.java:475)
at
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:106)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 

I know that the error might be in addchecker method,i read this method but
the coding of this method is such that, for all the null values, default
values are added.
(eg: if (queryAnalyzer == null) 
 queryAnalyzer = checker.getQueryAnalyzer(); )
Now so i feel that the Null checker value is sent when  
/checkers.add(checker);/  is executed.

If i am right tell me how to resolve this,else what has gone wrong.
Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: no such field error:smaller big block size details while indexing doc files

2013-10-09 Thread sweety

I will try using solrj.Thanks.

but I tried to index .docx file I am getting  some different error:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more
I read this 
solution(http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika),which
 says removal of jars solves errors,but there are no such mentioned jars in my 
classpath.
Is it that,Jars may cause the issue?

Thank You.



On Wednesday, October 9, 2013 12:54 PM, sweety shinde 
sweetyshind...@yahoo.com wrote:
 
I will try using solrJ.

Now I tried indexing .docx files and I get some different error,logs are:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539

Re: no such field error:smaller big block size details while indexing doc files

2013-10-09 Thread sweety

I will try using solrJ.

Now I tried indexing .docx files and I get some different error,logs are:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more

But does the jars cause these errors? Because I read one solution which said 
removal of few jars in classpath may solve the errors,but those jars are not 
present in my classpath.(the link to solution 
:http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika)

Thank You.



On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] 
ml-node+s472066n4094231...@n3.nabble.com wrote:
 
Hmmm, that is odd, the glob dynamicField should 
pick this up. 

Not quite sure what's going on. You an parse the file 
via Tika yourself and look at what's in there, it's a relatively 
simple SolrJ program, here's a sample: 
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best, 
Erick 

On Tue, Oct 8, 2013 at 4:15 PM, sweety [hidden email] wrote: 

 This my new schema.xml: 
 schema  name=documents 
 fields 
 field name=id type=string indexed=true stored=true required=true 
 multiValued=false/ 
 field name=author type=string indexed=true stored=true 
 multiValued=true/ 
 field name=comments type=text indexed=true stored=true 
 multiValued=false/ 
 field name=keywords type=text indexed=true stored=true 
 multiValued=false/ 
 field name=contents type=text indexed=true stored=true 
 multiValued=false/ 
 field name=title type=text indexed=true stored=true 
 multiValued=false/ 
 field name=revision_number type=string indexed=true stored=true 
 multiValued=false/ 
 field name=_version_ type=long indexed=true stored=true 
 multiValued=false/ 
 dynamicField name=ignored_* type=string indexed=false stored=true 
 multiValued=true/ 
 dynamicField name=* type=ignored  multiValued=true / 
 copyfield source=id dest=text / 
 copyfield source=author dest=text / 
 /fields 
 types 
 fieldtype name=ignored stored=false indexed=false 
 class=solr.StrField / 
 fieldType name=integer class=solr.IntField / 
 fieldType name=long class=solr.LongField / 
 fieldType name=string class=solr.StrField  / 
 fieldType name=text

Re: no such field error:smaller big block size details while indexing doc files

2013-10-08 Thread sweety

This my new schema.xml:
schema  name=documents
fields 
field name=id type=string indexed=true stored=true required=true 
multiValued=false/
field name=author type=string indexed=true stored=true 
multiValued=true/
field name=comments type=text indexed=true stored=true 
multiValued=false/
field name=keywords type=text indexed=true stored=true 
multiValued=false/
field name=contents type=text indexed=true stored=true 
multiValued=false/
field name=title type=text indexed=true stored=true 
multiValued=false/
field name=revision_number type=string indexed=true stored=true 
multiValued=false/
field name=_version_ type=long indexed=true stored=true 
multiValued=false/
dynamicField name=ignored_* type=string indexed=false stored=true 
multiValued=true/
dynamicField name=* type=ignored  multiValued=true /
copyfield source=id dest=text /
copyfield source=author dest=text /
/fields 
types
fieldtype name=ignored stored=false indexed=false class=solr.StrField 
/ 
fieldType name=integer class=solr.IntField /
fieldType name=long class=solr.LongField /
fieldType name=string class=solr.StrField  /  
fieldType name=text class=solr.TextField /
/types
uniqueKeyid/uniqueKey
/schema
I still get the same error.


 From: Erick Erickson [via Lucene] ml-node+s472066n4094013...@n3.nabble.com
To: sweety sweetyshind...@yahoo.com 
Sent: Tuesday, October 8, 2013 7:16 AM
Subject: Re: no such field error:smaller big block size details while indexing 
doc files
 


Well, one of the attributes parsed out of, probably the 
meta-information associated with one of your structured 
docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and 
Solr Cel is faithfully sending that to your index. If you 
want to throw all these in the bit bucket, try defining 
a true catch-all field that ignores things, like this. 
dynamicField name=* type=ignored multiValued=true / 

Best, 
Erick 

On Mon, Oct 7, 2013 at 8:03 AM, sweety [hidden email] wrote: 

 Im trying to index .doc,.docx,pdf files, 
 im using this url: 
 curl 
 http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
  
 -Fmyfile=@complex.doc 
 
 This is the error I get: 
 Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log 
 SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: 
 SMALLER_BIG_BLOCK_SIZE_DETAILS 
         at 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
  
         at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
  
         at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
  
         at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  
         at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  
         at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
  
         at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
  
         at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) 
         at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) 
         at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) 
         at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  
         at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) 
         at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
  
         at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
  
         at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
  
         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
         at java.lang.Thread.run(Unknown Source) 
 Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS 
         at 
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93)
  
         at 
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190)
  
         at 
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184)
  
         at 
 org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376)
  
         at 
 org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)
  
         at 
 org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) 
         at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) 
         at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219

no such field error:smaller big block size details while indexing doc files

2013-10-07 Thread sweety

Im trying to index .doc,.docx,pdf files,
im using this url:
curl
http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
-Fmyfile=@complex.doc

This is the error I get:
Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError:
SMALLER_BIG_BLOCK_SIZE_DETAILS
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more

Also using same type of url,txt,mp3 and pdf files are indexed successfully.
(curl
http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
-Fmyfile=@abc.txt)

Schema.xml is:
schema  name=documents
fields 
field name=id type=string indexed=true stored=true required=true
multiValued=false/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true stored=true
multiValued=false/
field name=_version_ type=long indexed=true stored=true
multiValued=false/

dynamicField name=ignored_* type=string indexed=false stored=true
multiValued=true/
copyfield source=id dest=text /
copyfield source=author dest=text /
/fields 

types
fieldType name=integer class=solr.IntField /
fieldType name=long class=solr.LongField /
fieldType name=string class=solr.StrField  /  
fieldType name=text class=solr.TextField /
fieldtype name=ignored stored=false indexed=false multiValued=true
class=solr.StrField /
/types
uniqueKeyid/uniqueKey
/schema

Im not able to understand what kind of error this is,please help me.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html
Sent from the Solr - User mailing list archive at Nabble.com.

43 matches

Mail list logo