all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Hi,
I am starting my solr instance with the command java
-Dsolr.solr.home=./test1/solr/ -jar start.jar
where I have a solr.xml file
?xml version=1.0 encoding=UTF-8 standalone=yes?
solr sharedLib=lib persistent=true
cores adminPath=/admin/cores
core default=false instanceDir=tester name=tester/
/cores
/solr

In the folder tester I have configurations - adapted from the rss examples

DataImporter.xml
dataConfig
 dataSource name=myfilereader type=FileDataSource/
   document
 entity name=jc rootEntity=false dataSource=null
 processor=FileListEntityProcessor
 fileName=^.*\.xml$ recursive=true
 baseDir=/projects/solrtest/transformedimport
 
   entity name=x rootEntity=true
   dataSource=myfilereader
   processor=XPathEntityProcessor
   url=${jc.fileAbsolutePath}
   stream=false forEach=/ARTIKEL
   
transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer
   logTemplate=processing ${jc.fileAbsolutePath}
   logLevel=info
   


 field column=title xpath=/DOKTITEL/OVERSKRIFT1 /
 field column=text  xpath=/AKROP/TXT  /



   /entity
 /entity
   /document
  /dataConfig

solrconfig.xml - same as the rss example only removed elevate components.

schema.xml


 fields
field name=title type=text indexed=true stored=true /
field name=txt type=text indexed=true stored=true /
field name=all_text type=text indexed=true stored=true
multiValued=true /
copyField source=title dest=all_text /
copyField source=txt dest=all_text /
/fields

removed the uniqueKey constraint.

When I go to http://localhost:8983/solr/tester/admin/
I get the admin page.
When I run http://localhost:8983/solr/tester/dataimport?command=full-import
it says

response
−
lst name=responseHeader
int name=status0/int
int name=QTime16/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=commandfull-import/str
str name=statusidle/str
str name=importResponse/
lst name=statusMessages/
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response
When I look at the log of that it says a bunch of stuff like:

INFO: processing c:\projects\solrtest\transformed\1.xml
org.apache.solr.common.util.XMLErrorLogger report
WARNING: XmL parser reported xml declaration in null, line 1, column
38: Inconsistent text encoding; declared as utf-8 in xml
declaration, application had passed Cp1252

Here is one of the processed documents

  ?xml version=1.0 encoding=utf-8 ?
- ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER
- DOKTITEL
  OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1
  /DOKTITEL
- AKROP
  TXTAdministrationsydelser er momspligtige. Dette gælder også når
de faktureres koncerninternt, f.eks. fra et moderselskab
(holdingselskab) til et datterselskab./TXT
  TXTDer er fradragsret for moms vedrørende køb af
administrationsydelser i samme omfang, som virksomheden kan fratrække
momsen af øvrige fællesomkostninger./TXT
  TXTHvis administrationsydelser faktureres på tværs af
landegrænserne, f.eks. indenfor internationale koncerner, kan der
gælde forskellige principper for momsberegningen i de enkelte
EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et
datterselskab i et andet land, herunder også i andre EU-lande, er det
myndighedernes holdning, at der skal faktureres med dansk moms./TXT
  TXTHvis en administrationsydelse faktureres mellem et selskab og
dets filial/-er, skal faktura altid udstedes uden moms. Handel med
ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre
momspligtige transaktioner./TXT
  TXTORegler/TXTO
- TXT
  LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR
  /TXT
  /AKROP
  /ARTIKEL

If I search for the text Administrationsydelser
http://localhost:8983/solr/tester/select/?q=Administrationsydelserversion=2.2start=0rows=10indent=on
I get

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qAdministrationsydelser/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

There is a segments.gen and a segments_4 file in my index but nothing
else. Tried looking with Luke but it seems not to be compatible with
the newest versions of Lucene...

version of solr is 3.1.0

Thanks,
Bryan Rasmussen


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Also if I check
solr/tester/dataimport it responds:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1634/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 11:55:47/str
−
str name=
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
/str
str name=Committed2011-04-18 11:55:48/str
str name=Optimized2011-04-18 11:55:48/str
str name=Total Documents Processed0/str
str name=Time taken 0:0:0.922/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response


On Mon, Apr 18, 2011 at 11:46 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,
 I am starting my solr instance with the command java
 -Dsolr.solr.home=./test1/solr/ -jar start.jar
 where I have a solr.xml file
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 solr sharedLib=lib persistent=true
        cores adminPath=/admin/cores
                core default=false instanceDir=tester name=tester/
        /cores
 /solr

 In the folder tester I have configurations - adapted from the rss examples

 DataImporter.xml
 dataConfig
  dataSource name=myfilereader type=FileDataSource/
   document
     entity name=jc rootEntity=false dataSource=null
             processor=FileListEntityProcessor
             fileName=^.*\.xml$ recursive=true
             baseDir=/projects/solrtest/transformedimport
             
       entity name=x rootEntity=true
               dataSource=myfilereader
               processor=XPathEntityProcessor
               url=${jc.fileAbsolutePath}
               stream=false forEach=/ARTIKEL
               
 transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer
               logTemplate=processing ${jc.fileAbsolutePath}
               logLevel=info
               


         field column=title     xpath=/DOKTITEL/OVERSKRIFT1 /
         field column=text      xpath=/AKROP/TXT  /



       /entity
     /entity
   /document
  /dataConfig

 solrconfig.xml - same as the rss example only removed elevate components.

 schema.xml


  fields
        field name=title type=text indexed=true stored=true /
        field name=txt type=text indexed=true stored=true /
        field name=all_text type=text indexed=true stored=true
 multiValued=true /
        copyField source=title dest=all_text /
        copyField source=txt dest=all_text /
 /fields

 removed the uniqueKey constraint.

 When I go to http://localhost:8983/solr/tester/admin/
 I get the admin page.
 When I run http://localhost:8983/solr/tester/dataimport?command=full-import
 it says

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 /lst
 −
 lst name=initArgs
 −
 lst name=defaults
 str name=configdataimporter.xml/str
 /lst
 /lst
 str name=commandfull-import/str
 str name=statusidle/str
 str name=importResponse/
 lst name=statusMessages/
 −
 str name=WARNING
 This response format is experimental.  It is likely to change in the future.
 /str
 /response
 When I look at the log of that it says a bunch of stuff like:

 INFO: processing c:\projects\solrtest\transformed\1.xml
 org.apache.solr.common.util.XMLErrorLogger report
 WARNING: XmL parser reported xml declaration in null, line 1, column
 38: Inconsistent text encoding; declared as utf-8 in xml
 declaration, application had passed Cp1252

 Here is one of the processed documents

  ?xml version=1.0 encoding=utf-8 ?
 - ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER
 - DOKTITEL
  OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1
  /DOKTITEL
 - AKROP
  TXTAdministrationsydelser er momspligtige. Dette gælder også når
 de faktureres koncerninternt, f.eks. fra et moderselskab
 (holdingselskab) til et datterselskab./TXT
  TXTDer er fradragsret for moms vedrørende køb af
 administrationsydelser i samme omfang, som virksomheden kan fratrække
 momsen af øvrige fællesomkostninger./TXT
  TXTHvis administrationsydelser faktureres på tværs af
 landegrænserne, f.eks. indenfor internationale koncerner, kan der
 gælde forskellige principper for momsberegningen i de enkelte
 EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et
 datterselskab i et andet land, herunder også i andre EU-lande, er det
 myndighedernes holdning, at der skal faktureres med dansk moms./TXT
  TXTHvis en administrationsydelse faktureres mellem et selskab og
 dets filial/-er, skal faktura altid udstedes uden moms. Handel med
 ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre
 momspligtige transaktioner./TXT
  TXTORegler/TXTO
 - TXT
  LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR
  /TXT
  /AKROP
  /ARTIKEL

 If I search for the text Administrationsydelser
 

Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread lboutros
did you try with the comlete xpath ?

field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
field column=text  xpath=/ARTIKEL/AKROP/TXT  / 

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
hah, actually I tried with complete xpaths earlier but they weren't
working but that was because I had made a mistake in my foreach.. and
then I decided that probably the foreach and the other xpaths were
being concatenated.

however it is not absolutely correct yet, if I run
http://localhost:8983/solr/tester/dataimport?command=full-importdebug=true
I get

response
−
lst name=responseHeader
int name=status0/int
int name=QTime422/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=commandfull-import/str
str name=modedebug/str
−
arr name=documents
−
lst
−
arr name=title
strForord (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAbonnementsudgifter (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAb skf (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAcontobeløb (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdgang til arrangementer (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministration, fast ejendom (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsfællesskab (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsydelser (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdsl (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdvokatomkostninger (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAfbestillingsgebyrer (MomsManual)/str
/arr
/lst
/arr
lst name=verbose-output/
str name=statusidle/str
str name=importResponseConfiguration Re-loaded sucessfully/str
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched22/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 12:26:52/str
−
str name=
Indexing completed. Added/Updated: 11 documents. Deleted 0 documents.
/str
str name=Total Documents Processed11/str
str name=Time taken 0:0:0.406/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response

so the title fields
 field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
are being added but not the the text fields
 field column=text  xpath=/ARTIKEL/AKROP/TXT  /

The most salient difference between these two is that will be more
than one TXT, I just tried with the parent element however and it
didn't do anything.

But when I do a search for MomsManual which you can see is in all the
title fields
I get
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qMomsManual/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

:(

Thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 12:23 PM, lboutros boutr...@gmail.com wrote:
 did you try with the comlete xpath ?

 field column=title     xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
 field column=text      xpath=/ARTIKEL/AKROP/TXT  /

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread lboutros
If a document contains multiple 'txt' fields, it should be marked as
'multiValued'.

field name=txt type=text indexed=true stored=true
multiValued=true/ 

But if I'm understanding well, you also tried this ? :

field column=text  xpath=/ARTIKEL/AKROP  / 

And for your search (MomsManual), could you give us your analyzer from the
schema.xml please ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype

 /types


 fields
field name=title type=text indexed=true stored=true /
field name=txt type=text indexed=true stored=true /
field name=all_text type=text indexed=true stored=true
multiValued=true /
copyField source=title dest=all_text /
copyField source=txt dest=all_text /
/fields
 defaultSearchFieldall_text/defaultSearchField
 solrQueryParser defaultOperator=AND/

/schema


the protwords.txt and stopwords.txt are also from the rss example.

thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 12:55 PM, lboutros boutr...@gmail.com wrote:
 If a document contains multiple 'txt' fields, it should be marked as
 'multiValued'.

 field name=txt type=text indexed=true stored=true
 multiValued=true/

 But if I'm understanding well, you also tried this ? :

 field column=text      xpath=/ARTIKEL/AKROP  /

 And for your search (MomsManual), could you give us your analyzer from the
 schema.xml please ?

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        charFilter class=solr.HTMLStripCharFilterFactory/
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldtype

  /types


  fields
        field name=title type=text indexed=true stored=true /
        field name=txt type=text indexed=true stored=true /
        field name=all_text type=text indexed=true stored=true
 multiValued=true /
        copyField source=title dest=all_text /
        copyField source=txt dest=all_text /
 /fields
  defaultSearchFieldall_text/defaultSearchField
  solrQueryParser defaultOperator=AND/

 /schema


 the protwords.txt and stopwords.txt are also from the rss example.

 thanks,
 Bryan Rasmussen

 On Mon, Apr 18, 2011 at 12:55 PM, lboutros boutr...@gmail.com wrote:
 If a document contains multiple 'txt' fields, it should be marked as
 'multiValued'.

 field name=txt type=text indexed=true stored=true
 multiValued=true/

 But if I'm understanding well, you also tried this ? :

 field column=text      xpath=/ARTIKEL/AKROP  /

 And for your search (MomsManual), could you give us your analyzer from the
 schema.xml please ?

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html
 Sent from the Solr - User mailing list archive at Nabble.com.