Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Also if I check
solr/tester/dataimport it responds:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1634/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 11:55:47/str
−
str name=
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
/str
str name=Committed2011-04-18 11:55:48/str
str name=Optimized2011-04-18 11:55:48/str
str name=Total Documents Processed0/str
str name=Time taken 0:0:0.922/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response


On Mon, Apr 18, 2011 at 11:46 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,
 I am starting my solr instance with the command java
 -Dsolr.solr.home=./test1/solr/ -jar start.jar
 where I have a solr.xml file
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 solr sharedLib=lib persistent=true
        cores adminPath=/admin/cores
                core default=false instanceDir=tester name=tester/
        /cores
 /solr

 In the folder tester I have configurations - adapted from the rss examples

 DataImporter.xml
 dataConfig
  dataSource name=myfilereader type=FileDataSource/
   document
     entity name=jc rootEntity=false dataSource=null
             processor=FileListEntityProcessor
             fileName=^.*\.xml$ recursive=true
             baseDir=/projects/solrtest/transformedimport
             
       entity name=x rootEntity=true
               dataSource=myfilereader
               processor=XPathEntityProcessor
               url=${jc.fileAbsolutePath}
               stream=false forEach=/ARTIKEL
               
 transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer
               logTemplate=processing ${jc.fileAbsolutePath}
               logLevel=info
               


         field column=title     xpath=/DOKTITEL/OVERSKRIFT1 /
         field column=text      xpath=/AKROP/TXT  /



       /entity
     /entity
   /document
  /dataConfig

 solrconfig.xml - same as the rss example only removed elevate components.

 schema.xml


  fields
        field name=title type=text indexed=true stored=true /
        field name=txt type=text indexed=true stored=true /
        field name=all_text type=text indexed=true stored=true
 multiValued=true /
        copyField source=title dest=all_text /
        copyField source=txt dest=all_text /
 /fields

 removed the uniqueKey constraint.

 When I go to http://localhost:8983/solr/tester/admin/
 I get the admin page.
 When I run http://localhost:8983/solr/tester/dataimport?command=full-import
 it says

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 /lst
 −
 lst name=initArgs
 −
 lst name=defaults
 str name=configdataimporter.xml/str
 /lst
 /lst
 str name=commandfull-import/str
 str name=statusidle/str
 str name=importResponse/
 lst name=statusMessages/
 −
 str name=WARNING
 This response format is experimental.  It is likely to change in the future.
 /str
 /response
 When I look at the log of that it says a bunch of stuff like:

 INFO: processing c:\projects\solrtest\transformed\1.xml
 org.apache.solr.common.util.XMLErrorLogger report
 WARNING: XmL parser reported xml declaration in null, line 1, column
 38: Inconsistent text encoding; declared as utf-8 in xml
 declaration, application had passed Cp1252

 Here is one of the processed documents

  ?xml version=1.0 encoding=utf-8 ?
 - ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER
 - DOKTITEL
  OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1
  /DOKTITEL
 - AKROP
  TXTAdministrationsydelser er momspligtige. Dette gælder også når
 de faktureres koncerninternt, f.eks. fra et moderselskab
 (holdingselskab) til et datterselskab./TXT
  TXTDer er fradragsret for moms vedrørende køb af
 administrationsydelser i samme omfang, som virksomheden kan fratrække
 momsen af øvrige fællesomkostninger./TXT
  TXTHvis administrationsydelser faktureres på tværs af
 landegrænserne, f.eks. indenfor internationale koncerner, kan der
 gælde forskellige principper for momsberegningen i de enkelte
 EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et
 datterselskab i et andet land, herunder også i andre EU-lande, er det
 myndighedernes holdning, at der skal faktureres med dansk moms./TXT
  TXTHvis en administrationsydelse faktureres mellem et selskab og
 dets filial/-er, skal faktura altid udstedes uden moms. Handel med
 ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre
 momspligtige transaktioner./TXT
  TXTORegler/TXTO
 - TXT
  LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR
  /TXT
  /AKROP
  /ARTIKEL

 If I search for the text Administrationsydelser
 

Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread lboutros
did you try with the comlete xpath ?

field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
field column=text  xpath=/ARTIKEL/AKROP/TXT  / 

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
hah, actually I tried with complete xpaths earlier but they weren't
working but that was because I had made a mistake in my foreach.. and
then I decided that probably the foreach and the other xpaths were
being concatenated.

however it is not absolutely correct yet, if I run
http://localhost:8983/solr/tester/dataimport?command=full-importdebug=true
I get

response
−
lst name=responseHeader
int name=status0/int
int name=QTime422/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=commandfull-import/str
str name=modedebug/str
−
arr name=documents
−
lst
−
arr name=title
strForord (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAbonnementsudgifter (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAb skf (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAcontobeløb (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdgang til arrangementer (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministration, fast ejendom (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsfællesskab (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsydelser (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdsl (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdvokatomkostninger (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAfbestillingsgebyrer (MomsManual)/str
/arr
/lst
/arr
lst name=verbose-output/
str name=statusidle/str
str name=importResponseConfiguration Re-loaded sucessfully/str
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched22/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 12:26:52/str
−
str name=
Indexing completed. Added/Updated: 11 documents. Deleted 0 documents.
/str
str name=Total Documents Processed11/str
str name=Time taken 0:0:0.406/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response

so the title fields
 field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
are being added but not the the text fields
 field column=text  xpath=/ARTIKEL/AKROP/TXT  /

The most salient difference between these two is that will be more
than one TXT, I just tried with the parent element however and it
didn't do anything.

But when I do a search for MomsManual which you can see is in all the
title fields
I get
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qMomsManual/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

:(

Thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 12:23 PM, lboutros boutr...@gmail.com wrote:
 did you try with the comlete xpath ?

 field column=title     xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
 field column=text      xpath=/ARTIKEL/AKROP/TXT  /

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread lboutros
If a document contains multiple 'txt' fields, it should be marked as
'multiValued'.

field name=txt type=text indexed=true stored=true
multiValued=true/ 

But if I'm understanding well, you also tried this ? :

field column=text  xpath=/ARTIKEL/AKROP  / 

And for your search (MomsManual), could you give us your analyzer from the
schema.xml please ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
well basically I copied out the RSS example as I figured that would be
the closest to what I wanted to do

?xml version=1.0 encoding=UTF-8 ?
schema name=tester version=1.1
  types
fieldType name=string class=solr.StrField
sortMissingLast=true omitNorms=true/
fieldType name=boolean class=solr.BoolField
sortMissingLast=true omitNorms=true/
fieldType name=integer class=solr.IntField omitNorms=true/
fieldType name=long class=solr.LongField omitNorms=true/
fieldType name=float class=solr.FloatField omitNorms=true/
fieldType name=double class=solr.DoubleField omitNorms=true/
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField
sortMissingLast=true omitNorms=true/
fieldType name=sfloat class=solr.SortableFloatField
sortMissingLast=true omitNorms=true/
fieldType name=sdouble class=solr.SortableDoubleField
sortMissingLast=true omitNorms=true/
fieldType name=date class=solr.DateField
sortMissingLast=true omitNorms=true/
fieldType name=random class=solr.RandomSortField indexed=true /
fieldType name=text_ws class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


!-- Less flexible matching, but less false matches.  Probably not
ideal for product names,
 but may be good for SKUs.  Can insert dashes in the wrong
place and still match. --
fieldType name=textTight class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.EnglishMinimalStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

fieldType name=alphaOnlySort class=solr.TextField
sortMissingLast=true omitNorms=true
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
!-- The TrimFilter removes any leading or trailing whitespace --
filter class=solr.TrimFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/
  /analyzer
/fieldType

fieldtype name=ignored stored=false indexed=false
class=solr.StrField /

fieldtype name=html stored=true indexed=true class=solr.TextField
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer 

Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Hmm, ok I see the schema was wrong - I was calling the TEXT field
txt... also now I am getting results on my title search after another
restart and reindex - setting the TXT fields to be multiValued.

Thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 1:09 PM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 well basically I copied out the RSS example as I figured that would be
 the closest to what I wanted to do

 ?xml version=1.0 encoding=UTF-8 ?
 schema name=tester version=1.1
  types
    fieldType name=string class=solr.StrField
 sortMissingLast=true omitNorms=true/
    fieldType name=boolean class=solr.BoolField
 sortMissingLast=true omitNorms=true/
    fieldType name=integer class=solr.IntField omitNorms=true/
    fieldType name=long class=solr.LongField omitNorms=true/
    fieldType name=float class=solr.FloatField omitNorms=true/
    fieldType name=double class=solr.DoubleField omitNorms=true/
    fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true omitNorms=true/
    fieldType name=slong class=solr.SortableLongField
 sortMissingLast=true omitNorms=true/
    fieldType name=sfloat class=solr.SortableFloatField
 sortMissingLast=true omitNorms=true/
    fieldType name=sdouble class=solr.SortableDoubleField
 sortMissingLast=true omitNorms=true/
    fieldType name=date class=solr.DateField
 sortMissingLast=true omitNorms=true/
    fieldType name=random class=solr.RandomSortField indexed=true /
    fieldType name=text_ws class=solr.TextField 
 positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
      /analyzer
    /fieldType
    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType


    !-- Less flexible matching, but less false matches.  Probably not
 ideal for product names,
         but may be good for SKUs.  Can insert dashes in the wrong
 place and still match. --
    fieldType name=textTight class=solr.TextField
 positionIncrementGap=100 
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=false/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.EnglishMinimalStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

    fieldType name=alphaOnlySort class=solr.TextField
 sortMissingLast=true omitNorms=true
      analyzer
        tokenizer class=solr.KeywordTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory /
        !-- The TrimFilter removes any leading or trailing whitespace --
        filter class=solr.TrimFilterFactory /
        filter class=solr.PatternReplaceFilterFactory
                pattern=([^a-z]) replacement= replace=all
        /
      /analyzer
    /fieldType

    fieldtype name=ignored stored=false indexed=false
 class=solr.StrField /

    fieldtype name=html stored=true indexed=true class=solr.TextField
      analyzer type=index
        charFilter class=solr.HTMLStripCharFilterFactory/
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/