[jira] Created: (SOLR-1783) adding all documents, not just collapsed ones, to the collapsed document group would give us resultset wide aggregate functions like sum/avg/etc

2010-02-19 Thread Gerald DeConto (JIRA)
adding all documents, not just collapsed ones, to the collapsed document group 
would give us resultset wide aggregate functions like sum/avg/etc


 Key: SOLR-1783
 URL: https://issues.apache.org/jira/browse/SOLR-1783
 Project: Solr
  Issue Type: Wish
  Components: search
Affects Versions: 1.4
 Environment: all
Reporter: Gerald DeConto
Priority: Minor
 Fix For: 1.5


Suggest we add functionality to allow inclusion of non-collapsed document in 
the collapseCount and aggregate function values of field collapsing (solr-236)

ie include ALL documents, not just the collapsed ones, possibly via some 
parameter like collapse.includeAllDocs?

I think this would be a great addition to the collapse code (and solr 
functionality), via what I would think is a small change, since solr doesnt 
have any other aggregation mechanism (as yet)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836022#action_12836022
 ] 

Jason Rutherglen commented on SOLR-1724:


We need a URL type parameter to define if a URL in a core info is to a zip file 
or to a Solr server download point.  

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836018#action_12836018
 ] 

Jason Rutherglen commented on SOLR-1724:


Some further notes... I can reuse the replication code, but am going to place 
the functionality into core admin handler because it needs to work across cores 
and not have to be configured in each core's solrconfig.  

Also, we need to somehow support merging cores... Is that available yet?  Looks 
like merge indexes is only for directories?

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836013#action_12836013
 ] 

Jason Rutherglen commented on SOLR-1724:


I think the check on whether a conf file's been modified, to reload the core, 
can borrow from the replication handler and check the diff based on the 
checksum of the files... Though this somewhat complicates the storage of the 
checksum and the resultant JSON file.

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835981#action_12835981
 ] 

Jason Rutherglen commented on SOLR-1724:


{quote}Will this http access also allow a cluster with
incrementally updated cores to replicate a core after a node
failure? {quote}

You're talking about moving an existing core into HDFS? That's a
great idea... I'll add it to the list!

Maybe for general "actions" to the system, there can be a ZK
directory acting as a queue that contains actions to be
performed by the cluster. When the action is completed it's
corresponding action file is deleted. 

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Can we move FileFetcher out of SnapPuller?

2010-02-19 Thread Jason Rutherglen
Can we move FileFetcher out of SnapPuller? This will assist with
reusing the replication handler for moving/copying cores.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835965#action_12835965
 ] 

Jason Rutherglen commented on SOLR-1724:


For the above core moving, utilizing the existing Java replication will 
probably be suitable.  However, in all cases we need to copy the contents of 
all files related to the core (meaning everything under conf and data).  How 
does one accomplish this?

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835963#action_12835963
 ] 

Ted Dunning commented on SOLR-1724:
---


Will this http access also allow a cluster with incrementally updated cores to 
replicate a core after a node failure?

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835955#action_12835955
 ] 

Jason Rutherglen commented on SOLR-1724:


Also needed is the ability to move an existing core to a
different Solr server. The core will need to be copied via
direct HTTP file access, from a Solr server to another Solr
server. There is no need to zip the core first. 

This feature is useful for core indexes that have been
incrementally built, then need to be archived (i.e. the index was not
constructed using Hadoop).

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1777) fields with sortMissingLast don't sort correctly

2010-02-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835937#action_12835937
 ] 

Yonik Seeley commented on SOLR-1777:


bq. Correction: Tom Hill and I have seen this bug in the distant past on Solr 
1.2 or 1.3. 

Then it was a different bug.  This code was all new for 1.4
Was it reproducible, and was a bug report filed?  Can it still be reproduced?

> fields with sortMissingLast don't sort correctly
> 
>
> Key: SOLR-1777
> URL: https://issues.apache.org/jira/browse/SOLR-1777
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Critical
> Fix For: 1.5
>
> Attachments: SOLR-1777.patch, SOLR-1777.patch
>
>
> field types with the sortMissingLast=true attribute can have results sorted 
> incorrectly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1777) fields with sortMissingLast don't sort correctly

2010-02-19 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835936#action_12835936
 ] 

Lance Norskog commented on SOLR-1777:
-

Correction: Tom Hill and I have seen this bug in the distant past on Solr 1.2 
or 1.3.

> fields with sortMissingLast don't sort correctly
> 
>
> Key: SOLR-1777
> URL: https://issues.apache.org/jira/browse/SOLR-1777
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Critical
> Fix For: 1.5
>
> Attachments: SOLR-1777.patch, SOLR-1777.patch
>
>
> field types with the sortMissingLast=true attribute can have results sorted 
> incorrectly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835871#action_12835871
 ] 

Jason Rutherglen edited comment on SOLR-1724 at 2/19/10 6:36 PM:
-

Removing cores seems to work well, on to modified cores... I'm checkpointing 
progress in case things break, I can easily roll back.

  was (Author: jasonrutherglen):
Removing cores seems to work well, on to modified cores... I checkpointing 
progress in case things break, I can easily roll back.
  
> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1724:
---

Attachment: SOLR-1724.patch

Removing cores seems to work well, on to modified cores... I checkpointing 
progress in case things break, I can easily roll back.

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-1297) Enable sorting by Function Query

2010-02-19 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reopened SOLR-1297:



function queries aren't weighted... reopening to track this problem.

> Enable sorting by Function Query
> 
>
> Key: SOLR-1297
> URL: https://issues.apache.org/jira/browse/SOLR-1297
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1297.patch
>
>
> It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
> where this was first mentioned by Yonik as part of the generic solution to 
> geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1395) Integrate Katta

2010-02-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated SOLR-1395:
--

Attachment: solr-1395-1431-katta0.6.patch

I've updated the patch for katta 0.6 however I deleted the SolrIndexer class 
since I don't need it and it relies on the indexer contribution to katta which 
seems to be deprecated.
I still need to work on this patch, because I need the functionality to search 
all registered indexes. I'd appreciate any help!

> Integrate Katta
> ---
>
> Key: SOLR-1395
> URL: https://issues.apache.org/jira/browse/SOLR-1395
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
> katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, 
> solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
> solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, SOLR-1395.patch, 
> SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
> zkclient-0.1-dev.jar, zookeeper-3.2.1.jar
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1782) unexpected statscomponent values

2010-02-19 Thread Gerald DeConto (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835821#action_12835821
 ] 

Gerald DeConto edited comment on SOLR-1782 at 2/19/10 5:08 PM:
---

file contains readme.txt, sample data, solrconfig.xml, data-config.xml and 
schema.xml

addtional info (ie results from running test can be found at 
http://old.nabble.com/getting-unexpected-statscomponent-values-td27599248.html


  was (Author: gdeconto):
file contains readme.txt, sample data, solrconfig.xml, data-config.xml and 
schema.xml
  
> unexpected statscomponent values
> 
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar
>
>
> I wanted to understand the statscomponent better, so I setup a simple test 
> index with a few thousand docs.  In my schema I have: 
> - an indexed multivalue sint field (StatsFacetField) that can contain values 
> 0 thru 5 that I want to use as my stats.facet field. 
> - an indexed single value sint field (ValueOfOneField) that will always 
> contain the value 1 and that I want stats on for this test 
> When I execute the following query: 
> http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField
> For this situation (*:*) I was expecting that the statscomponent Count/Sum 
> values for each possible value in StatsFacetField to match the facet values 
> for StatsFacetField.  They don't.  Some are close (ie 204 vs 214) while 
> others are way off (ie 230 vs 8000)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1782) unexpected statscomponent values

2010-02-19 Thread Gerald DeConto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald DeConto updated SOLR-1782:
-

Attachment: index.rar

file contains readme.txt, sample data, solrconfig.xml, data-config.xml and 
schema.xml

> unexpected statscomponent values
> 
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar
>
>
> I wanted to understand the statscomponent better, so I setup a simple test 
> index with a few thousand docs.  In my schema I have: 
> - an indexed multivalue sint field (StatsFacetField) that can contain values 
> 0 thru 5 that I want to use as my stats.facet field. 
> - an indexed single value sint field (ValueOfOneField) that will always 
> contain the value 1 and that I want stats on for this test 
> When I execute the following query: 
> http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField
> For this situation (*:*) I was expecting that the statscomponent Count/Sum 
> values for each possible value in StatsFacetField to match the facet values 
> for StatsFacetField.  They don't.  Some are close (ie 204 vs 214) while 
> others are way off (ie 230 vs 8000)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835819#action_12835819
 ] 

Jason Rutherglen commented on SOLR-1724:


We need a test case for deleted and modified cores.

> Real Basic Core Management with Zookeeper
> -
>
> Key: SOLR-1724
> URL: https://issues.apache.org/jira/browse/SOLR-1724
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: 1.5
>
> Attachments: commons-lang-2.4.jar, gson-1.4.jar, 
> hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, 
> SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, 
> SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, 
> etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1782) unexpected statscomponent values

2010-02-19 Thread Gerald DeConto (JIRA)
unexpected statscomponent values


 Key: SOLR-1782
 URL: https://issues.apache.org/jira/browse/SOLR-1782
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4
 Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
Reporter: Gerald DeConto


I wanted to understand the statscomponent better, so I setup a simple test 
index with a few thousand docs.  In my schema I have: 
- an indexed multivalue sint field (StatsFacetField) that can contain values 0 
thru 5 that I want to use as my stats.facet field. 
- an indexed single value sint field (ValueOfOneField) that will always contain 
the value 1 and that I want stats on for this test 

When I execute the following query: 

http://localhost:8080/solr/select?q=*:*&stats=true&stats.field=ValueOfOneField&stats.facet=StatsFacetField&rows=0&facet=on&facet.limit=10&facet.field=StatsFacetField

For this situation (*:*) I was expecting that the statscomponent Count/Sum 
values for each possible value in StatsFacetField to match the facet values for 
StatsFacetField.  They don't.  Some are close (ie 204 vs 214) while others are 
way off (ie 230 vs 8000)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



DIH - One Table but Different sections.

2010-02-19 Thread stocki

Hello.

I have this problem.

i got a big items table and this table should be index with solr and dih.
Indexing of the complete table works fine.

but in this table are items from different shops. and i only to index items
from special shops. like WHERE ID = 55.

its possible to giv the DIH an parameter with the shop_ID and only this shop
go to index ? 


-- 
View this message in context: 
http://old.nabble.com/DIH---One-Table-but-Different-sections.-tp27656967p27656967.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1297) Enable sorting by Function Query

2010-02-19 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835796#action_12835796
 ] 

Grant Ingersoll commented on SOLR-1297:
---

See the processSort() method in QueryParsing.  Seemed like the logical way for 
the ValueSource to be able to define how sorting should work for the 
ValueSource, essentially mirroring getSortField from the FieldType.

> Enable sorting by Function Query
> 
>
> Key: SOLR-1297
> URL: https://issues.apache.org/jira/browse/SOLR-1297
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1297.patch
>
>
> It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
> where this was first mentioned by Yonik as part of the generic solution to 
> geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1297) Enable sorting by Function Query

2010-02-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835795#action_12835795
 ] 

Yonik Seeley commented on SOLR-1297:


Just curious - what were the reasons behind adding a getSortField() method to 
ValueSource?

> Enable sorting by Function Query
> 
>
> Key: SOLR-1297
> URL: https://issues.apache.org/jira/browse/SOLR-1297
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1297.patch
>
>
> It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
> where this was first mentioned by Yonik as part of the generic solution to 
> geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: poly fields in fieldcache

2010-02-19 Thread patrick o'leary
>Have you looked at UninvertedField?  Not sure if it is what you are after,
but it is essentially multivalued FC.
Yeah, not that much is public in UnInvertedField-


On Fri, Feb 19, 2010 at 6:56 AM, Grant Ingersoll wrote:

>
> On Feb 18, 2010, at 1:54 PM, patrick o'leary wrote:
>
> > Cool, we want to examine certain fields of a docset which currently is a
> > multivalued field, obviously only the first value gets loaded into field
> > cache.
> >
> > But if a poly field that can be loaded into FC, then that will work, we
> can
> > extend FC to return an Field[] and make that cache aware.
>
> Have you looked at UninvertedField?  Not sure if it is what you are after,
> but it is essentially multivalued FC.
>
> >
> >
> > Sorting on multivalued is definitely a subjective matter that a function
> > query would rock in, having an FC or VS that supports is would make that
> > much easier, like say events where an event can have multiple dates,
> > sort_date_compared(performance_dates, NOW)
>
> I suppose if you have a multivalued function (see the Vector Distance
> stuff), you can do that already.  This is in fact how sort by distance works
> on trunk now.
>
> >
> > Or even distances from a poly, polyDistance(convexHull, point) or
> > polyDistance(center, point) etc..
>
> Yep.


[jira] Updated: (SOLR-1781) Replication index directories not always cleaned up

2010-02-19 Thread Terje Sten Bjerkseth (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terje Sten Bjerkseth updated SOLR-1781:
---

Attachment: 0001-Replication-does-not-always-clean-up-old-directories.patch

This patch seems to fix the problem, mostly.


> Replication index directories not always cleaned up
> ---
>
> Key: SOLR-1781
> URL: https://issues.apache.org/jira/browse/SOLR-1781
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 1.4
> Environment: Windows Server 2003 R2, Java 6b18
>Reporter: Terje Sten Bjerkseth
> Fix For: 1.5
>
> Attachments: 
> 0001-Replication-does-not-always-clean-up-old-directories.patch
>
>
> We had the same problem as someone described in 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e.
>  A partial copy of that message:
> We're using the new replication and it's working pretty well. There's  
> one detail I'd like to get some more information about.
> As the replication works, it creates versions of the index in the data  
> directory. Originally we had index/, but now there are dated versions  
> such as index.20100127044500/, which are the replicated versions.
> Each copy is sized in the vicinity of 65G. With our current hard drive  
> it's fine to have two around, but 3 gets a little dicey. Sometimes  
> we're finding that the replication doesn't always clean up after  
> itself. I would like to understand this better, or to not have this  
> happen. It could be a configuration issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1781) Replication index directories not always cleaned up

2010-02-19 Thread Terje Sten Bjerkseth (JIRA)
Replication index directories not always cleaned up
---

 Key: SOLR-1781
 URL: https://issues.apache.org/jira/browse/SOLR-1781
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: Windows Server 2003 R2, Java 6b18
Reporter: Terje Sten Bjerkseth
 Fix For: 1.5


We had the same problem as someone described in 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e.
 A partial copy of that message:

We're using the new replication and it's working pretty well. There's  
one detail I'd like to get some more information about.

As the replication works, it creates versions of the index in the data  
directory. Originally we had index/, but now there are dated versions  
such as index.20100127044500/, which are the replicated versions.

Each copy is sized in the vicinity of 65G. With our current hard drive  
it's fine to have two around, but 3 gets a little dicey. Sometimes  
we're finding that the replication doesn't always clean up after  
itself. I would like to understand this better, or to not have this  
happen. It could be a configuration issue.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: poly fields in fieldcache

2010-02-19 Thread Grant Ingersoll

On Feb 18, 2010, at 1:54 PM, patrick o'leary wrote:

> Cool, we want to examine certain fields of a docset which currently is a
> multivalued field, obviously only the first value gets loaded into field
> cache.
> 
> But if a poly field that can be loaded into FC, then that will work, we can
> extend FC to return an Field[] and make that cache aware.

Have you looked at UninvertedField?  Not sure if it is what you are after, but 
it is essentially multivalued FC.

> 
> 
> Sorting on multivalued is definitely a subjective matter that a function
> query would rock in, having an FC or VS that supports is would make that
> much easier, like say events where an event can have multiple dates,
> sort_date_compared(performance_dates, NOW)

I suppose if you have a multivalued function (see the Vector Distance stuff), 
you can do that already.  This is in fact how sort by distance works on trunk 
now.

> 
> Or even distances from a poly, polyDistance(convexHull, point) or
> polyDistance(center, point) etc..

Yep.

[jira] Updated: (SOLR-1536) Support for TokenFilters that may modify input documents

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1536:


Attachment: altering.patch

Updated patch - previous patch produced NPEs.

> Support for TokenFilters that may modify input documents
> 
>
> Key: SOLR-1536
> URL: https://issues.apache.org/jira/browse/SOLR-1536
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Attachments: altering.patch, altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the 
> input document based on analysis of other fields of this document. This need 
> arises e.g. when indexing multilingual documents, or when doing NLP 
> processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of 
> the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that 
> TokenFilter-s created from this factory may modify fields in a 
> SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this 
> concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1536) Support for TokenFilters that may modify input documents

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1536:


Attachment: (was: altering.patch)

> Support for TokenFilters that may modify input documents
> 
>
> Key: SOLR-1536
> URL: https://issues.apache.org/jira/browse/SOLR-1536
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Attachments: altering.patch, altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the 
> input document based on analysis of other fields of this document. This need 
> arises e.g. when indexing multilingual documents, or when doing NLP 
> processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of 
> the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that 
> TokenFilter-s created from this factory may modify fields in a 
> SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this 
> concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1535) Pre-analyzed field type

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1535:


Attachment: altering.patch

Oops .. previous patch produced NPEs. This one doesn't.

> Pre-analyzed field type
> ---
>
> Key: SOLR-1535
> URL: https://issues.apache.org/jira/browse/SOLR-1535
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Fix For: 1.5
>
> Attachments: preanalyzed.patch, preanalyzed.patch
>
>
> PreAnalyzedFieldType provides a functionality to index (and optionally store) 
> content that was already processed and split into tokens using some external 
> processing chain. This implementation defines a serialization format for 
> sending tokens with any currently supported Attributes (eg. type, posIncr, 
> payload, ...). This data is de-serialized into a regular TokenStream that is 
> returned in Field.tokenStreamValue() and thus added to the index as index 
> terms, and optionally a stored part that is returned in Field.stringValue() 
> and is then added as a stored value of the field.
> This field type is useful for integrating Solr with existing text-processing 
> pipelines, such as third-party NLP systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1535) Pre-analyzed field type

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1535:


Attachment: (was: altering.patch)

> Pre-analyzed field type
> ---
>
> Key: SOLR-1535
> URL: https://issues.apache.org/jira/browse/SOLR-1535
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Fix For: 1.5
>
> Attachments: preanalyzed.patch, preanalyzed.patch
>
>
> PreAnalyzedFieldType provides a functionality to index (and optionally store) 
> content that was already processed and split into tokens using some external 
> processing chain. This implementation defines a serialization format for 
> sending tokens with any currently supported Attributes (eg. type, posIncr, 
> payload, ...). This data is de-serialized into a regular TokenStream that is 
> returned in Field.tokenStreamValue() and thus added to the index as index 
> terms, and optionally a stored part that is returned in Field.stringValue() 
> and is then added as a stored value of the field.
> This field type is useful for integrating Solr with existing text-processing 
> pipelines, such as third-party NLP systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1535) Pre-analyzed field type

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1535:


Attachment: preanalyzed.patch

Sigh ... attach correct patch.

> Pre-analyzed field type
> ---
>
> Key: SOLR-1535
> URL: https://issues.apache.org/jira/browse/SOLR-1535
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Fix For: 1.5
>
> Attachments: preanalyzed.patch, preanalyzed.patch
>
>
> PreAnalyzedFieldType provides a functionality to index (and optionally store) 
> content that was already processed and split into tokens using some external 
> processing chain. This implementation defines a serialization format for 
> sending tokens with any currently supported Attributes (eg. type, posIncr, 
> payload, ...). This data is de-serialized into a regular TokenStream that is 
> returned in Field.tokenStreamValue() and thus added to the index as index 
> terms, and optionally a stored part that is returned in Field.stringValue() 
> and is then added as a stored value of the field.
> This field type is useful for integrating Solr with existing text-processing 
> pipelines, such as third-party NLP systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1535) Pre-analyzed field type

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1535:


Attachment: (was: preanalyzed.patch)

> Pre-analyzed field type
> ---
>
> Key: SOLR-1535
> URL: https://issues.apache.org/jira/browse/SOLR-1535
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Fix For: 1.5
>
> Attachments: preanalyzed.patch
>
>
> PreAnalyzedFieldType provides a functionality to index (and optionally store) 
> content that was already processed and split into tokens using some external 
> processing chain. This implementation defines a serialization format for 
> sending tokens with any currently supported Attributes (eg. type, posIncr, 
> payload, ...). This data is de-serialized into a regular TokenStream that is 
> returned in Field.tokenStreamValue() and thus added to the index as index 
> terms, and optionally a stored part that is returned in Field.stringValue() 
> and is then added as a stored value of the field.
> This field type is useful for integrating Solr with existing text-processing 
> pipelines, such as third-party NLP systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1536) Support for TokenFilters that may modify input documents

2010-02-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1536:


Attachment: altering.patch

Patch updated to trunk.

> Support for TokenFilters that may modify input documents
> 
>
> Key: SOLR-1536
> URL: https://issues.apache.org/jira/browse/SOLR-1536
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Attachments: altering.patch, altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the 
> input document based on analysis of other fields of this document. This need 
> arises e.g. when indexing multilingual documents, or when doing NLP 
> processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of 
> the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that 
> TokenFilter-s created from this factory may modify fields in a 
> SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this 
> concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Solr-trunk #1064

2010-02-19 Thread Apache Hudson Server
See 




[jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents

2010-02-19 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835660#action_12835660
 ] 

Andrzej Bialecki  commented on SOLR-1536:
-

Term freq. vectors are not available at this stage, unless you go to an expense 
of creating a MemoryIndex. I think the solution I proposed is less costly and 
more generic.

> Support for TokenFilters that may modify input documents
> 
>
> Key: SOLR-1536
> URL: https://issues.apache.org/jira/browse/SOLR-1536
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 1.5
>Reporter: Andrzej Bialecki 
> Attachments: altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the 
> input document based on analysis of other fields of this document. This need 
> arises e.g. when indexing multilingual documents, or when doing NLP 
> processing such as NER. However, currently this is not possible to do.
> This issue provides an implementation of this functionality that consists of 
> the following parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that 
> TokenFilter-s created from this factory may modify fields in a 
> SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this 
> concept, with a JUnit test.
> * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.