[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073435#comment-15073435 ]
Dennis Gove commented on SOLR-7535: ----------------------------------- I haven't looked at the patch yet but to answer your questions, 1. The name of the collection in the URL path and collection in any part of the expression can absolutely be different. There are couple of cases where this difference will most likely appear. First, during a join or merge of multiple of collections only one of the collection names can be contained in the URL. For example {code} innerJoin( search(people, fl="personId,name", q="*:*", sort="personId asc"), search(address, fl="personId,city", q="state:ny", sort="personId asc"), on="personId" ) {code} Two collections are being hit but only a single one can be included in the URL. There aren't any hard and fast rules about which one should be used in the URL and that decision could depend on a lot of different things, especially if the collections live in different clouds or on different hardware. There is also the possibility that the http request is being sent to what is effectively an empty collection which only exists to perform parallel work using the streaming api. For example, imagine you want to do some heavy metric processing but you don't want to use more resources than necessary on the servers where the collections live. You could setup an empty collection on totally different hardware with the intent of that hardware to act solely as workers on the real collection. This would allow you to do the heavy lifting on separate hardware from where the collection actually lives. For these reasons the collection name is a required parameter in the base streams (SolrCloudStream and FacetStream). 2. There are three types of parameters; positional, unnamed, and named. *Positional parameters* are those which must exist in some specific location in the expression. IIRC, the only positional parameters are the collection names in the base streams. This is done because the collection name is critical and as such we can say it is the first parameter, regardless of anything else included. *Unnamed parameters* are those whose meaning can be determined by the content of the parameter. For example, {code} rollup( search(people, fl="personId,name,age", q="*:*", sort="personId asc"), max(age), min(age), avg(age) ) {code} in this example we know that search(...) is a stream and max(...), min(...), and avg(...) are metrics. Unnamed parameters are also very useful in situations where the number of parameters of that type are non-determistic. In the example above one could provide any number of metrics and by keeping them unnamed the user can just keep adding new metrics without worrying about names. Another example of this is with the MergeStream where one can merge 2 or more streams together. *Named parameters* are used when you want to be very clear about what a particular parameter is being used for. For example, the "on" parameter in a join clause is to indicate that the join should be done on some field (or fields). The HashJoinStream is an interesting one because we have a named parameter "hashed" whose parameter needs to be a stream. In this case the decision to use a named parameter was made so as to be very clear to the user which stream is being hashed and which one is not. Generally it comes down to whether a parameter name would make things clearer for the user. > Add UpdateStream to Streaming API and Streaming Expression > ---------------------------------------------------------- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ > Reporter: Joel Bernstein > Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org