Mike Thomsen created SOLR-9525:
----------------------------------

             Summary: split() function for streaming
                 Key: SOLR-9525
                 URL: https://issues.apache.org/jira/browse/SOLR-9525
             Project: Solr
          Issue Type: Wish
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Mike Thomsen


This is the original description I posted on solr-user:

Read this article and thought it could be interesting as a way to do ingestion:

https://dzone.com/articles/solr-streaming-expressions-for-collection-auto-upd-1

Example from the article:

daemon(id="12345",

 runInterval="60000",

 update(users,

 batchSize=10,

 jdbc(connection="jdbc:mysql://localhost/users?user=root&password=solr", 
sql="SELECT id, name FROM users", sort="id asc", driver="com.mysql.jdbc.Driver")

)

What's the best way to handle a multivalue field using this API? Is there a way 
to tokenize something returned in a database field?

Joel Bernstein responded with this:

Unfortunately there currently isn't a way to split a field. But this would
be nice functionality to add.

The approach would be to an add a split operation that would be used by the
select() function. It would look like this:

select(jdbc(...), split(fieldA, delim=","), ...)

This would make a good jira issue.


So the TL;DR version is that I need the ability to specify in such a streaming 
operation certain fields to tokenize into multivalue fields. In one schema I 
may have to support, there are probably a half a dozen such fields.

Perhaps I am missing a feature here, but until this is done it looks like this 
new capability cannot handle multivalue fields until something like this is in 
place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to