Re: Default core in multi-core

2008-04-21 Thread Ryan McKinley

hymmm -- "default" should be removed and should not do anything.

The intended behavior is that /solr/select?q=*:* should be 404, you  
would need to call

 /solr/core0/select or /solr/core1/select to get anything.

So yes, this is a bug.  I'll remove the old "default=true" bit and  
file a bug to make sure we fix before 1.3


thanks
ryan


On Apr 21, 2008, at 4:00 PM, James Brady wrote:

Hi all,
In the latest trunk version, default='true' doesn't have the effect  
I would have expected running in multi core mode.


The example multicore.xml has:



But queries such as
/solr/select?q=*:*
and
/solr/admin/

are executed against core1, not core0 as I would have expected: it  
seems to be that the last core defined in multicore.xml is the de  
facto 'default'.


Is this a bug or am I missing something?

Thanks,
James




More Like This boost

2008-04-21 Thread Francisco Sanmartin
Is it possible to boost the query that MoreLikeThis returns before 
sending it to Solr? I mean, technically is possible, because you can add 
a factor to the whole query but...does it make sense? (Remember that 
MoreLikeThis can already boosts each term inside the query).


For example, this could be a result of MoreLikeThis (with native 
boosting enabled)


queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
morelikethis^0.67)


what I want to do is

queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
morelikethis^0.67)^0.60  <---(notice the boost of 0.60 for the whole 
query)


does Solr applys the boost with a "distributive" property ? (like in 
mathematics). Does it really boost it or it ignores it (Because terms 
have been already boosted inside)?


Thanks in advance.

Pako


Re: CorruptIndexException

2008-04-21 Thread Robert Haschart

Michael,
Following up on this most recent post.  I remembered that the initial 
records were translated into utf-8 prior to indexing, whereas the 
updates records are in the marc-8 encoding internally, and the program 
is written to translate them on the fly as they are read in before 
indexing them.  I just tried pre-translating them, and the entire set of 
updates ran.   So at this point it looks like the problem is in my 
marc-8 to utf-8 translation code.  I'll look into this possibility further.


   Thank again for your help on my earlier problem.
   -Robert Haschart

Robert Haschart wrote:


Michael,

To answer your questions: I completely deleted the index each time 
before retesting.and the java command as shown by "ps" does show 
-Xbatch.

The program is running on:
> uname -a
Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 
07:18:21 EST 2008 i686 i686 i386 GNU/Linux

> more /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

after downgrading from the originally reported version of java:   
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)

to this one:
> java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)

the indexing run sucessfully completed processing all 112 record 
chunks!  Yea!
(with -Xbatch on the command line, I didn't try with the 1.6.0_02 java 
without -Xbatch)



However, I am still seeing a different problem which is what caused me 
to upgrade to Lucene version 2.3.1 and start experiencing the 
CorruptIndexException.


Basically we have a set of 112 files dumped from our OPAC in a binary 
Marc record format, each of which contains about 35000 records.  In 
addition to those files we have a set of daily updates, consisting of 
new records that have been added, and edits for existing records, as 
well as a separate file listing the ids of records to be deleted.


After creating the initial index, I have a script loop through all of 
the update files, adding in all of the new records and updates, and 
then processing all of that day's deletes.  Typically at some point in 
processing the updates, an auto-commit will be triggered.  Eventually 
for one of these auto-commits (not the same one every time) the commit 
will never finish.  The behavior I see is that it will write out 
information about doing a commit (as shown below) and then seeming do 
nothing ever after, although the CPU % as reported by "ps" for the 
process sits around 90 to 100 % and stays there for days.  While the 
program is sitting there doing this, no changes are made to the files 
in the index.  So its really not clear what it is doing.
If you have any ideas about this other problem, I would appreciate ant 
insight you have.


Adding record 10993: u4386758
Adding record 10994: u4386760
Adding record 10995: u4386767
Adding record 10996: u4386768
Adding record 10997: u4386812
Adding record 10998: u4386816
Adding record 10999: u4386850
Adding record 11000: u4386883
Adding record 11001: u4387066
Adding record 11002: u4387074
Adding record 11003: u4387764
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
commit

INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions

INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions

INFO: DirectUpdateHandler2 docs deleted=11003
Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 
commit

INFO: end_commit_flush
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 


Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 


Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 


Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
 

Re: better stemming engine than Porter?

2008-04-21 Thread Chris Hostetter

: to create an issue, make an account on jira and post it...
: https://issues.apache.org/jira/browse/SOLR
: 
: Give that a try and holler if you have trouble.

To elaborate more (and save some time on the question answering of the 
correct procedures) ...

http://wiki.apache.org/solr/HowToContribute

(note the "contributing code" section)


-Hoss



Re: POST interface to sending queries to SOLR?

2008-04-21 Thread Yonik Seeley
On Mon, Apr 21, 2008 at 4:13 PM, Jim Adams <[EMAIL PROTECTED]> wrote:
> Could you point me to an example somewhere?

The command line tool "curl" can do either GET or POST:

curl http://localhost:8983/solr/select --data 'q=foo&rows=100'

-Yonik


Re: POST interface to sending queries to SOLR?

2008-04-21 Thread Jim Adams
Could you point me to an example somewhere?

Thanks!

On Wed, Apr 16, 2008 at 10:08 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : I know there is a 'GET' to send queries to Solr.  But is there a POST
> : interface to sending queries?  If so, can someone point me in that
> : direction?
>
> POST using standard the standard application/x-www-form-urlencoded
> content-type (ie: the same way you would POST using any HTML form)
>
>
>
> -Hoss
>
>


Default core in multi-core

2008-04-21 Thread James Brady

Hi all,
In the latest trunk version, default='true' doesn't have the effect I  
would have expected running in multi core mode.


The example multicore.xml has:
 
 

But queries such as
/solr/select?q=*:*
and
/solr/admin/

are executed against core1, not core0 as I would have expected: it  
seems to be that the last core defined in multicore.xml is the de  
facto 'default'.


Is this a bug or am I missing something?

Thanks,
James


Re: CorruptIndexException

2008-04-21 Thread Robert Haschart

Michael,

To answer your questions:  
   I completely deleted the index each time before retesting. 
   and the java command as shown by "ps" does show -Xbatch.

The program is running on:
> uname -a
Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 
07:18:21 EST 2008 i686 i686 i386 GNU/Linux

> more /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

after downgrading from the originally reported version of java:   
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)

to this one:
> java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)

the indexing run sucessfully completed processing all 112 record 
chunks!  Yea!
(with -Xbatch on the command line, I didn't try with the 1.6.0_02 java 
without -Xbatch)



However, I am still seeing a different problem which is what caused me 
to upgrade to Lucene version 2.3.1 and start experiencing the 
CorruptIndexException.


Basically we have a set of 112 files dumped from our OPAC in a binary 
Marc record format, each of which contains about 35000 records.  In 
addition to those files we have a set of daily updates, consisting of 
new records that have been added, and edits for existing records, as 
well as a separate file listing the ids of records to be deleted.


After creating the initial index, I have a script loop through all of 
the update files, adding in all of the new records and updates, and then 
processing all of that day's deletes.  Typically at some point in 
processing the updates, an auto-commit will be triggered.  Eventually 
for one of these auto-commits (not the same one every time) the commit 
will never finish.  The behavior I see is that it will write out 
information about doing a commit (as shown below) and then seeming do 
nothing ever after, although the CPU % as reported by "ps" for the 
process sits around 90 to 100 % and stays there for days.  While the 
program is sitting there doing this, no changes are made to the files in 
the index.  So its really not clear what it is doing. 

If you have any ideas about this other problem, I would appreciate ant 
insight you have.


Adding record 10993: u4386758
Adding record 10994: u4386760
Adding record 10995: u4386767
Adding record 10996: u4386768
Adding record 10997: u4386812
Adding record 10998: u4386816
Adding record 10999: u4386850
Adding record 11000: u4386883
Adding record 11001: u4387066
Adding record 11002: u4387074
Adding record 11003: u4387764
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions

INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2 
doDeletions

INFO: DirectUpdateHandler2 docs deleted=11003
Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming 

Re: better stemming engine than Porter?

2008-04-21 Thread Ryan McKinley

Hey-

to create an issue, make an account on jira and post it...
https://issues.apache.org/jira/browse/SOLR

Give that a try and holler if you have trouble.

ryan



On Apr 21, 2008, at 12:31 PM, Wagner,Harry wrote:

Hi HH,
Here's a note I sent Solr-dev a while back:

---
I've implemented a Solr plug-in that wraps KStem for Solr use (someone
else had already written a Lucene wrapper for it).  KStem is  
considered

to be more appropriate for library usage since it is much less
aggressive than Porter (i.e., searches for organization do NOT match  
on
organ!). If there is any interest in feeding this back into Solr I  
would

be happy to contribute it.
---

I believe there was interest in it, but I never opened an issue for it
and I don't know if it was ever followed-up on. I'd be happy to do  
that

now. Can someone on the Solr-dev team point me in the right direction
for opening an issue?

Thanks... harry


-Original Message-
From: Hung Huynh [mailto:[EMAIL PROTECTED]
Sent: Monday, April 21, 2008 11:59 AM
To: solr-user@lucene.apache.org
Subject: better stemming engine than Porter?

I recall I've read some where in one of the mailing-list archives that
some
one had developed a better stemming algo for Solr than the built-in
Porter
stemming. Does anyone have link to that stemming module?

Thanks,

HH







RE: better stemming engine than Porter?

2008-04-21 Thread Wagner,Harry
Hi HH,
Here's a note I sent Solr-dev a while back:

---
I've implemented a Solr plug-in that wraps KStem for Solr use (someone
else had already written a Lucene wrapper for it).  KStem is considered
to be more appropriate for library usage since it is much less
aggressive than Porter (i.e., searches for organization do NOT match on
organ!). If there is any interest in feeding this back into Solr I would
be happy to contribute it.
---

I believe there was interest in it, but I never opened an issue for it
and I don't know if it was ever followed-up on. I'd be happy to do that
now. Can someone on the Solr-dev team point me in the right direction
for opening an issue?

Thanks... harry


-Original Message-
From: Hung Huynh [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 21, 2008 11:59 AM
To: solr-user@lucene.apache.org
Subject: better stemming engine than Porter?

I recall I've read some where in one of the mailing-list archives that
some
one had developed a better stemming algo for Solr than the built-in
Porter
stemming. Does anyone have link to that stemming module? 

Thanks,

HH 





Re: case insensitive sorting

2008-04-21 Thread Shalin Shekhar Mangar
In your schema.xml, make sure the type specified for usernameSort field has
the LowerCaseFilterFactory applied on it.

On Mon, Apr 21, 2008 at 9:34 PM, Ismail Siddiqui <[EMAIL PROTECTED]> wrote:

> Hi all
> in my schema.xm I have follwing entry
> 
>
> but the problem I am facing that when i sort it on usernameSort it does
> case
> sensitive sorting.. i.e firslt uppercase then lowercase.
> I want to do case insensitive sorting. Is there anyway when i copyField it
> changes it to all lower case or do i have to change it to lowercase
> when I am indexing it ??
>
>
>
> thanks
>
> ismail Siddiqui
>



-- 
Regards,
Shalin Shekhar Mangar.


case insensitive sorting

2008-04-21 Thread Ismail Siddiqui
Hi all
in my schema.xm I have follwing entry


but the problem I am facing that when i sort it on usernameSort it does case
sensitive sorting.. i.e firslt uppercase then lowercase.
I want to do case insensitive sorting. Is there anyway when i copyField it
changes it to all lower case or do i have to change it to lowercase
when I am indexing it ??



thanks

ismail Siddiqui


better stemming engine than Porter?

2008-04-21 Thread Hung Huynh
I recall I've read some where in one of the mailing-list archives that some
one had developed a better stemming algo for Solr than the built-in Porter
stemming. Does anyone have link to that stemming module? 

Thanks,

HH 



RE: How to troubleshoot this HTTP ERROR: 500 (NULL) error?

2008-04-21 Thread Hung Huynh
Thanks. I fixed the schema and it's working now.

Also my other problem of "not all defined fields showing up in the results"
is also resolved. I found out last night that Solr is case-sensitive. I
typed all fields in the schema in lower-case, and my CSV files had mixed
cases for some of the fields. These mixed case fields do not show up in the
XML results. Sorry, I've been working in the Windows environment for so long
and forgot about Unix-case-sensitive environment.

Thanks,

HH

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 18, 2008 4:44 PM
To: solr-user@lucene.apache.org
Subject: Re: How to troubleshoot this HTTP ERROR: 500 (NULL) error?


: java.lang.NullPointerException
: 
:   at
:
org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:73
: )

I'm guessing this is one of the following issues...

http://issues.apache.org/jira/browse/SOLR-525
http://issues.apache.org/jira/browse/SOLR-529

...both have been changed in the trunk to provide better error messages, 
but you can fix the root cause yourself (which you check your query/schema 
to figure out which problem it is)


-Hoss




Re: XSLT transform before update?

2008-04-21 Thread David Smiley @MITRE.org

Cool.  So you're saying that this xslt file will operate on the entire XML
document that was fetched from the URL and just pass it on to solr?  Thanks
for supporting this.  The XML files I have coming from the my data source
are big but not not too big to risk an out-of-memory error.  And I've found
xslt to perform fast for me.  I like your proposed TemplateTransformer
too... I'm tempted to use that in place of XSLT.  Great job Paul.

It'd be neat to have an XSLT transformer for your framework that operates on
a single entity (that addresses the memory usage problem).  I know your
entities are HashMap based instead of XML, however.

~ David


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> We are planning to incorporate both your requests in the next patch.
> The implementation is going to be as follows.mention the xsl file
> location as follows
> 
> 
> 
> So the processing will be done after the XSL transformation. If after
> your XSL transformation it produces a valid 'add' document not even
> fields is necessary. Otherwise you will need to write all the fields
> and their xpaths like any other xml
> 
>  useSolrAddXml="true"/>
> 
> So it will assume that the schema is same as that of the add xml and
> does the needful.
> 
> Another feature is going to be a TemplateTransformer  which takes in a
> Template as follows
> 
> 
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>>  >
>>  > For example, lets say you get fields first-name and last-name in the
>> XML.
>>  > But in the schema.xml you have a field called "name" in which you need
>> to
>>  > concatenate the values of first-name and last-name (with a space in
>>  > between). Create a Java class:
>>  >
>>  > public class ConcatenateTransformer { public Object
>>  > transformRow(Map>  > Object> row) { String firstName = row.get("first-name"); String
>> lastName =
>>  > row.get("last-name"); row.put("name", firstName + " " + lastName);
>> return
>>  > row; } }
>>  >
>>  > Add this class to solr's classpath by putting its jar in
>> solr/WEB-INF/lib
>>  >
>>  > The data-config.xml should like this:
>>  > http://myurl/example.xml";
>>  > transformer="com.yourpackage.ConcatenateTransformer"> >  > column="first-name" xpath="/record/first-name" /> >  > column="last-name"
>>  > xpath="/record/last-name" />  
>>  >
>>  > This will call ConcatenateTransformer.transformRow method for each row
>> and
>>  > you can concatenate any field with any field (or constant). Note that
>> solr
>>  > document will keep only those fields which are in the schema.xml, the
>> rest
>>  > are thrown away.
>>  >
>>  > If you don't want to write this in Java, you can use JavaScript by
>> using
>>  > the
>>  > built-in ScriptTransformer, for an example look at
>>  >
>> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>>  >
>>  > However, I'm beginning to realize that XSLT is a common need, let me
>> see
>>  > how
>>  > best we can accomodate it in DataImportHandler. Which XSLT processor
>> will
>>  > you prefer?
>>  >
>>  > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
>>  > <[EMAIL PROTECTED]>
>>  > wrote:
>>  >
>>  >>
>>  >> I'm in the same situation as you Daniel.  The DataImportHandler is
>> pretty
>>  >> awesome but I'd also prefer it had the power of XSLT.  The XPath
>> support
>>  >> in
>>  >> it doesn't suffice for me.  And I can't do very basic things like
>>  >> concatenate one value with another, say a constant even.  It's too
>> bad
>>  >> there
>>  >> isn't a mode that XSLT can be put in to to not build the whole file
>> into
>>  >> memory to do the transform.  I've been looking into this and have
>> turned
>>  >> up
>>  >> nothing.  It would be neat if there was a STaX to multi-document
>> adapter,
>>  >> at
>>  >> which point XSLT could be applied to the smaller fixed-size documents
>>  >> instead of the entire data stream.  I haven't found anything like
>> this so
>>  >> it'd need to be built.  For now my documents aren't too big to XSLT
>>  >> in-memory.
>>  >>
>>  >> ~ David
>>  >>
>>  >>
>>  >> Daniel Papasian wrote:
>>  >> >
>>  >> > Shalin Shekhar Mangar wrote:
>>  >> >> Hi Daniel,
>>  >> >>
>>  >> >> Maybe if you can give us a sample of how your XML looks like, we
>> can
>>  >> >> suggest
>>  >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>>  >> >> use-cases
>>  >> >> we have yet encountered are solvable using the
>> XPathEntityProcessor in
>>  >> >> DataImportHandler without using XSLT, for details look at
>>  >> >>
>>  >>
>> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>>  >> >
>>  >> > I think even if it is possible to use SOLR-469 for my needs, I'd
>> still
>>  >> > prefer the XSLT approach, because it's going to be a bit of
>>  >> > configuration either way, and I'd rather it be an XSLT stylesheet
>> than
>>  >> > solrconfig.xml.  In addition, I haven't yet decided whether I want
>> to
>>  >> > a

Re: XSLT transform before update?

2008-04-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
We are planning to incorporate both your requests in the next patch.
The implementation is going to be as follows.mention the xsl file
location as follows



So the processing will be done after the XSL transformation. If after
your XSL transformation it produces a valid 'add' document not even
fields is necessary. Otherwise you will need to write all the fields
and their xpaths like any other xml



So it will assume that the schema is same as that of the add xml and
does the needful.

Another feature is going to be a TemplateTransformer  which takes in a
Template as follows


http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>  >
>  > For example, lets say you get fields first-name and last-name in the XML.
>  > But in the schema.xml you have a field called "name" in which you need to
>  > concatenate the values of first-name and last-name (with a space in
>  > between). Create a Java class:
>  >
>  > public class ConcatenateTransformer { public Object
>  > transformRow(Map  > Object> row) { String firstName = row.get("first-name"); String lastName =
>  > row.get("last-name"); row.put("name", firstName + " " + lastName); return
>  > row; } }
>  >
>  > Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
>  >
>  > The data-config.xml should like this:
>  > http://myurl/example.xml";
>  > transformer="com.yourpackage.ConcatenateTransformer">   > column="first-name" xpath="/record/first-name" />   > column="last-name"
>  > xpath="/record/last-name" />  
>  >
>  > This will call ConcatenateTransformer.transformRow method for each row and
>  > you can concatenate any field with any field (or constant). Note that solr
>  > document will keep only those fields which are in the schema.xml, the rest
>  > are thrown away.
>  >
>  > If you don't want to write this in Java, you can use JavaScript by using
>  > the
>  > built-in ScriptTransformer, for an example look at
>  > 
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>  >
>  > However, I'm beginning to realize that XSLT is a common need, let me see
>  > how
>  > best we can accomodate it in DataImportHandler. Which XSLT processor will
>  > you prefer?
>  >
>  > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
>  > <[EMAIL PROTECTED]>
>  > wrote:
>  >
>  >>
>  >> I'm in the same situation as you Daniel.  The DataImportHandler is pretty
>  >> awesome but I'd also prefer it had the power of XSLT.  The XPath support
>  >> in
>  >> it doesn't suffice for me.  And I can't do very basic things like
>  >> concatenate one value with another, say a constant even.  It's too bad
>  >> there
>  >> isn't a mode that XSLT can be put in to to not build the whole file into
>  >> memory to do the transform.  I've been looking into this and have turned
>  >> up
>  >> nothing.  It would be neat if there was a STaX to multi-document adapter,
>  >> at
>  >> which point XSLT could be applied to the smaller fixed-size documents
>  >> instead of the entire data stream.  I haven't found anything like this so
>  >> it'd need to be built.  For now my documents aren't too big to XSLT
>  >> in-memory.
>  >>
>  >> ~ David
>  >>
>  >>
>  >> Daniel Papasian wrote:
>  >> >
>  >> > Shalin Shekhar Mangar wrote:
>  >> >> Hi Daniel,
>  >> >>
>  >> >> Maybe if you can give us a sample of how your XML looks like, we can
>  >> >> suggest
>  >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>  >> >> use-cases
>  >> >> we have yet encountered are solvable using the XPathEntityProcessor in
>  >> >> DataImportHandler without using XSLT, for details look at
>  >> >>
>  >> 
> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>  >> >
>  >> > I think even if it is possible to use SOLR-469 for my needs, I'd still
>  >> > prefer the XSLT approach, because it's going to be a bit of
>  >> > configuration either way, and I'd rather it be an XSLT stylesheet than
>  >> > solrconfig.xml.  In addition, I haven't yet decided whether I want to
>  >> > apply any patches to the version that we will deploy, but if I do go
>  >> > down the route of the XSLT transform patch, if I end up having to back
>  >> > it out the amount of work that it would be for me to do the transform
>  >> at
>  >> > the XML source would be negligible, where it would be quite a bit of
>  >> > work ahead of me to go from using the DataImportHandler to not using it
>  >> > at all.
>  >> >
>  >> > Because both the solr instance and the XML source are in house, I have
>  >> > the ability to apply the XSLT at the source instead of at solr.
>  >> > However, there are different teams of people that control the XML
>  >> source
>  >> > and solr, so it would require a bit more office coordination to do it
>  >> on
>  >> > the backend.
>  >> >
>  >> > The data is a filemaker XML export (DTD fmresultset) and it looks
>  >> > roughly like this:
>  >> > 
>  >> >
>  >> >