[jira] [Commented] (SOLR-16812) Support CBOR format for update/query

Jason Gerlowski (Jira) Mon, 05 Jun 2023 14:11:06 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729477#comment-17729477
 ]


Jason Gerlowski commented on SOLR-16812:
----------------------------------------

bq. It has 1100 docs.  How often do we index/fetch more than 1100 docs?

For me the relevant number isn't the number of documents; it's the size of the 
request/response in bytes.  "films.json" is hardly half a megabyte.  How often 
does a Solr response exceed that?  Absolutely all the time.

bq. Here is a benchmark from the wild.

I appreciate that this golang ser-de experiment found CBOR to be faster than 
JSON in that one golang library.

But a benchmark "from the wild" doesn't really tell the community anything 
about the performance of the CBOR code that was committed to Solr this morning. 
 And neither does the JUnit test you linked to.  (see the PR review 
[here|https://github.com/apache/solr/pull/1655#pullrequestreview-1462820682] 
for specific concerns)

How does Solr's new CBOR support compare to Solr's support for JSON (for 
non-SolrJ users) or for javabin (for our current SolrJ users)?

That's what I'm asking about, and it's still an open question as far as I can 
tell.  You're right that Solr-CBOR vs. Solr-JSON should be a slam dunk, but 
it's an important sanity-check.  And Solr-CBOR vs Solr-javabin is an important 
datapoint to inform how aggressively javabin users might want to switch to CBOR.

bq. The point is most of these binary formats are much better than JSON.

Sure.  But that doesn't make them all the same.  Binary formats have tradeoffs 
in performance, popularity, compatibility w/ various languages, etc.  Some are 
going to be better for Solr on the whole than others.  I'm sure you considered 
these tradeoffs in picking CBOR over other binary formats.  I just want to hear 
a little more about that, if I can.

"I have done benchmarks" Great!  Meaning the JUnit tests that I commented on in 
your PR?  Or something else?  What did those look like?

"Avro is not considered because there is no jackson support"  [Avro does 
support Jackson|https://github.com/FasterXML/jackson-dataformats-binary], 
afaict?  As do a number of other formats (Smile, etc.)

bq. javabin must go(if possible) [...but] it's a non-trivial task

Ugh, yeah.  Very little in Solr these days is trivial.

But at the same time - I think the project would suffer if we were to punt on 
this entirely.  The scope here is waaay smaller, but this is the same dynamic 
that's given us 3 (or is it 4?) different faceting modules!

If you're unwilling to tackle javabin deprecation proper, would you be willing 
to at least put together a writeup of what the steps would be and what the 
hurdles are?

> Support CBOR format for update/query
> ------------------------------------
>
>                 Key: SOLR-16812
>                 URL: https://issues.apache.org/jira/browse/SOLR-16812
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Javabin is quite efficient and fast . But non-java users have to use JSON 
> exclusively
>  
> [CBOR |http://example.com/] is a widely used format that is supported by most 
> languages. 
>  
> Here is a benchmark of updating using CBOR vs. JSON our films.json
> {code:java}
> Payload Size (bytes)
> ============
>  
> json : 633600
> cbor : 290672
> javabin: 234520
> time taken to index
> ====================
> JSON: 583ms
> CBOR: 509ms
> JAVABIN : 549
> time takes to query *:* 1100 docs
> ==================================
> json: 92 ms
> javabin : 70ms 
> cbor : 63ms{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16812) Support CBOR format for update/query

Reply via email to