Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17?

David Adams via 4D_Tech Sat, 15 Sep 2018 00:08:51 -0700

Short version:
I need to load some fields from records into a big text thingy.


The code runs on the server-side only.

I'm keen to preserve RAM.

What are the trade-offs in V17 between *GOTO SELECTED* record and *SELECTION
TO ARRAY*? I've been using *SELECTION TO ARRAY*, but it's hard to read,
write, and maintain. And, I realized, might be de-optimized for memory
because you have to load all of the data you're processing into arrays.
(Yes, you can chunk it, but that doesn't change the fundamental point that
you pre-load a lot of data.)

Any test results or thoughts? I considered a fair range of option and did
comparison tests on none. The long version below includes more details on
the two solutions I'm down to, plus the ideas that I discarded.


TL;DR version
I'm working in V17 and I'm hoping that someone has done some real-world
tests already that could help me out with a question. Here's the setup: I
need to load up some fields from lots of records and push them into an
external system. It's going to Postgres, but that's not an important
detail, the result is a ginormous text object. The result could just as
well be a text or JSON file dump. The main constraint is available memory.
Performance matters when there are millions of records but, typically, the
only important consideration is memory. As far as the final solution goes,
it's ideally code that's easy to write, read, and maintain. As a plus, we
can position the code to run server side, so client-server optimization
isn't an issue. And, for the record, in lots of cases there isn't enough
data to make memory an issue at all, so readable reliable code is
definitely a preference.

Note: Yes, I can chunk data in ranges, etc. to keep things within my memory
footprint. I'm doing that....but the question still remains

Here are the solutions I've come up with:

*QUERY* and a *For* loop with *GOTO SELECTED RECORD*.
Easy to read, write and maintain. But when you use *GOTO SELECTED RECORD*,
do you get the whole record in V17? Without fat fields? Since this is
server-side or stand-alone, should I care? On the upside, you're only
loading one record at a time, so only burning through memory for that
record while you use it.

*SELECTION TO ARRAY* and a *For* loop
This is what I have been doing....based on old habits as much as anything.
Yes, you only get the columns you want, but it gets _all_ of the rows at
once. So, you burn up a lot of memory with the arrays and then duplicate++
that memory when building up the output. On the code side, that kind
of *SELECTION
TO ARRAY*-loop-read by index code is ugly, tedious to write, and tedious to
maintain. It's clear(ish) and reliable, but only worth it if it pays for
itself somehow. In other words, it has to be a good deal better than *GOTO
SELECTED RECORD* to be worth it. Says the guy who has been doing all *SELECTION
TO ARRAY* forever.

Entity Selection and a *For* or *For each* loop
I have no clue why an entity selection is *C_OBJECT* instead of
*C_COLLECTION*, to give you a sense of how much I know about this stuff. I
was happy to discover that you can easily create an entity selection from a
current selection, so old style queries work fine:

*C_OBJECT*($stuff_es)
*QUERY*([Stuff];[Stuff]Counter>=10000)
$stuff_es:=*Create entity selection*([Stuff])

The resulting *For*/*For each* loop code is very readable, it's == *GOTO
SELECTED RECORD*, but with a different syntax. Otherwise, same same. I
*suspect* that the memory use here is excellent. I'm guessing that as you
navigate through the entity selection, you're only really pulling the data
you use. But maybe not. If you do a For each, you get an object (entity)
with all of the fields. So, possibly this is approach is even worse than *GOTO
SELECTED RECORD* which, I'm guessing, doesn't load as many fields. I
haven't tested these points out in any way. If anyone has dug into this, it
would be great to know about the difference (if any) in what 4D loads when
you:

-- Use *GOTO SELECTED RECORD*

-- Use a *For each* loop on an entity selection, which builds an
$entity_object which you can then read/write to/from like $entity_object.ID

-- Use a *For* loop on an entity selection and then reference
$specifc_es[0].ID

It's pretty easy to imagine different ways that 4D might have implemented
things that are more or less efficient in each of these days. I have no
idea what they actually did.I'm kind of curious about this behavior in V17,
but have already talked myself out of using entity selections. Why? Because
the table and field references are brittle and *case-sensitive*. Man, I
truly hate case-sensitive names. When do I want them? Never. Not once, and
I never will. This isn't all on 4D, many languages are case-sensitive. It
makes sense if you're a computer. I'm not a computer, I'm a person...to me
its just horrible. Anyway, not exclusively a 4D problem...because in 4D you
can avoid it altogether.

For those that haven't been following along at home, here's a hello world
level V17 For each loop over an entity selection:

*C_OBJECT*($stuff_object)
*For each* ($stuff_object;$stuff_es) // The loop automatically populates
$stuff_object as it iterates through the list.
$output_text+output_text+$stuff_object.ID+*Char*(*Carriage return*)
*End for each*
See that $stuff_object.ID statement? The ID part = [Stuff]ID. It's all
case-sensitive. Rename the field in the structure to id three months from
now and the code above breaks. And for "breaks", you don't get a compiler
error, you don't get a syntax error in the Method Editor, and you likely
don't get a runtime error. You code just screws up silently. So, yeah, not
going that way.

*Note*: Collections are very handy when the source data is a big static
JSON. It makes the static values highly interactive. I wrote a little
screen like that last week and loved the results.

*Note*: In a *For each* loop, I can't find a way to read the index of the
current item. Like, that you're on item 23. You can get the total item
count with .length, but I see no way to get the current index. Or on
collections. It can be useful when you've got a progress indicator to
update. You can always roll your own $index:=$index+1 sort of thing.
Reminder: All of the new V17 stuff is 0 (offset) indexed, not 1 (position)
indexed.

Honorable mention: *Selection to JSON*
Yeah, kind of nice...a very excellent command in some situations. In this
case, wildly wrong, I'd say. You load the whole JSON in one go so you get
your source data + formatting + names. It's pretty flabby. Then you have to
parse and walk that to get the proper text. If 4D had a Selection to text
(->Table;Template) system that was *not* JSON, I'd be golden. That would be
perfect. The *Selection to JSON* code doesn't allow in-line functions, so
there's that. Oh, wait, 4D does have a command like this...*PROCESS 4D TAGS*.
Hmmm. Yeah, probably the best approach for memory and the worst for
brittleness. Not going there.

Okay, so does anyone have any relevant, V17-based test results yet? I don't
have the time or appetite to do the tests myself and won't be surprised if
no one else has either. Not to be a **** about it, but I'm only interested
in *test results*. It's fun to estimate program behavior from first
principles, but it has pretty much zero predictive value. Having just spewn
out a bunch of speculation, I certainly can't hold it against anyone else
for riffing too.

I've spent some embarrassing number of hours (for hours read "months") of
my life testing 4D performance and, well, you have to test to find out.
Conventional wisdom tends to be *worse* than random guessing. It's great to
hear theories and stories from the folks at 4D, but that's all they
are...stories and theories. Background information can give you a better
idea of what to test and where to look, but that's all. Modern machines +
modern OS + 4D + your code + all of the various subcomponents (RAM,
network, SSD)...it's a lot. So, it's not a criticism to say only testing
can hope to turn up meaningful results. Given all of those factors, narrow
test are ideal and obviously can't be generalized too far. Still, lots
better than speculation!

Thanks.
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17?

Reply via email to