On Sat, 18 Jul 2020, Artur Tarassow wrote:

Am 17.07.20 um 16:32 schrieb Allin Cottrell:
On Fri, 17 Jul 2020, Artur Tarassow wrote:

Am 16.07.20 um 14:53 schrieb Allin Cottrell:
On Wed, 15 Jul 2020, Allin Cottrell wrote:

On Wed, 15 Jul 2020, Artur Tarassow wrote:

But what about the case when adding the " --permanent" flag?

I can see a case for shrinking the strings array when the --permanent option is given, though it's not totally clear-cut.

Here's a follow-up. You could think of this as a prototype of what we might do internally with a string-valued series on permanent sub-sampling.

Sorry for the late reply, Allin. Yes that looks good to me.

But would "permanent sub-sampling" mean that this only applies when executing the smpl command with the --permanent flag? Or would it also apply when storing a sub-sampled data set?

I favour doing this only when the --permanent option is given. It's an information-destroying move, and I can imagine cases where one wants to save a sub-sample and yet not lose the information in question. But if you _want_ to lose it, without using --permanent, then just store in a format other than gdt or gdtb.

I understand your point, Allin. And getting it worked when using the --permanent option would be very useful.

But, let me loudly think about some of my use cases -- maybe some others have to deal with similar ones...

[cases where using the --permanent option with "smpl" would clearly not be convenient]

OK, here's what's now in git (not yet in snapshots, I'd prefer to see some testing first):

(1) Imposing a sample restriction with the --permanent option results in "trimming" of string-valued series: only string values that appear within the sub-sample are preserved, and the numeric coding for such series is adjusted accordingly. Note, this means that any given observation will have the same string value as it had in the full dataset, but may not have the same numeric code.

(2) When using the "store" command with a native target (gdt or gdtb) there's a new option --trim-strvals which has a similar effect. We achieve this as follows:

* Any string-valued series are first backed-up (copied in RAM).

* Before we actually write the data file we "trim" as described above.

* Once the write is finished we restore the full form of the string-valued series.

So you can sub-sample, store the data in trimmed form, then restore the full dataset without loss of information -- or at least that's the idea! This has worked OK in my limited testing today, but more testing is wanted.

One further remark: "store --trim-strvals" will work even when there's no sub-sample in place, in case the dataset contains any redundant string values. I hadn't noticed before, but gretl's grunfeld.gdt contains a redundant 11th firmname, "American Steel" (there are only 10 firms in our dataset). You can remove that by using store with the new option.

Allin
_______________________________________________
Gretl-users mailing list -- gretl-users@gretlml.univpm.it
To unsubscribe send an email to gretl-users-le...@gretlml.univpm.it
Website: 
https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/

Reply via email to