Re: Allowing DESC for a PRIMARY KEY column

2024-03-29 Thread Mitar
Hi!

On Fri, Mar 29, 2024 at 9:41 PM Tom Lane  wrote:
> You would need a lot stronger case than "I didn't bother checking
> whether I really need this".

Thanks! I have tested it this way (based on your example):

create table t (id int not null, revision int not null);
create unique index on t (id, revision desc);
explain select * from t where id=123 order by revision desc limit 1;
   QUERY PLAN

 Limit  (cost=0.15..3.45 rows=1 width=8)
   ->  Index Only Scan using t_id_revision_idx on t  (cost=0.15..36.35
rows=11 width=8)
 Index Cond: (id = 123)
(3 rows)

It is very similar, with only the direction difference. Based on [1] I
was under the impression that "Index Only Scan Backward" is much
slower than "Index Only Scan", but based on your answer it seems I
misunderstood and backwards scanning is comparable with forward
scanning? Especially this section:

"Consider a two-column index on (x, y): this can satisfy ORDER BY x, y
if we scan forward, or ORDER BY x DESC, y DESC if we scan backward.
But it might be that the application frequently needs to use ORDER BY
x ASC, y DESC. There is no way to get that ordering from a plain
index, but it is possible if the index is defined as (x ASC, y DESC)
or (x DESC, y ASC)."

I am curious, what is then an example where the quote from [1]
applies? Really just if I would be doing ORDER BY id, revision DESC on
the whole table? Because one future query I am working on is where I
select all rows but for only the latest (highest) revision. Curious if
that will have an effect there.


Mitar

[1] https://www.postgresql.org/docs/16/indexes-ordering.html

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar




Allowing DESC for a PRIMARY KEY column

2024-03-29 Thread Mitar
Hi!

I have the same problem as [1]. I have table something like:

CREATE TABLE values (
  id int NOT NULL,
  revision int NOT NULL,
  data jsonb NOT NULL,
  PRIMARY KEY (id, revision)
)

And I would like to be able to specify PRIMARY KEY (id, revision DESC)
because the most common query I am making is:

SELECT data FROM values WHERE id=123 ORDER BY revision DESC LIMIT 1

My understanding, based on [2], is that the primary key index cannot
help here, unless it is defined with DESC on revision. But this does
not seem to be possible. Would you entertain a patch adding this
feature? It seems pretty straightforward?


Mitar

[1] 
https://stackoverflow.com/questions/45597101/primary-key-with-asc-or-desc-ordering
[2] https://www.postgresql.org/docs/16/indexes-ordering.html

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar




Re: Adding application_name to the error and notice message fields

2024-03-27 Thread Mitar
Hi!

Oh, I can use PQparameterStatus to obtain application_name of the
current connection. It seems then it is not needed to add this
information into notice message.


Mitar

On Wed, Mar 27, 2024 at 4:22 PM Mitar  wrote:
>
> Hi!
>
> We take care to always set application_name to improve our log lines
> where we use %a in log_line_prefix to log application name, per [1].
> But notices which are sent to the client do not have the application
> name and are thus hard to attribute correctly. Could "a" be added with
> the application name (when available) to the error and notice message
> fields [2]?
>
>
> Mitar
>
> [1] 
> https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-LINE-PREFIX
> [2] https://www.postgresql.org/docs/current/protocol-error-fields.html
>
> --
> https://mitar.tnode.com/
> https://twitter.com/mitar_m
> https://noc.social/@mitar



-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar




Adding application_name to the error and notice message fields

2024-03-27 Thread Mitar
Hi!

We take care to always set application_name to improve our log lines
where we use %a in log_line_prefix to log application name, per [1].
But notices which are sent to the client do not have the application
name and are thus hard to attribute correctly. Could "a" be added with
the application name (when available) to the error and notice message
fields [2]?


Mitar

[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-LINE-PREFIX
[2] https://www.postgresql.org/docs/current/protocol-error-fields.html

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar




[Wikidata-tech] Re: Timestamps with calendarmodel other than Q1985727 and Q1985786

2024-03-24 Thread Mitar
Hi!

There was no response here. I made the following issue instead:

https://phabricator.wikimedia.org/T360859


Mitar

On Sat, Mar 2, 2024 at 7:24 PM Mitar  wrote:
>
> Hi!
>
> Recently, a timestamp with calendarmodel
> https://www.wikidata.org/wiki/Q12138 has been introduced into
> Wikidata:
>
> https://www.wikidata.org/w/index.php?title=Q105958428=2004936527
>
> How is this possible? I thought that the only allowed values are
> Q1985727 and Q1985786?
>
>
> Mitar
>
> --
> https://mitar.tnode.com/
> https://twitter.com/mitar_m
> https://noc.social/@mitar



-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar
___
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360859: Timestamps with calendarmodel other than Q1985727 and Q1985786

2024-03-24 Thread Mitar
Mitar created this task.
Mitar added a project: Wikidata.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Recently, a timestamp with calendarmodel https://www.wikidata.org/wiki/Q12138 
has been introduced into
  Wikidata: 
https://www.wikidata.org/w/index.php?title=Q105958428=2004936527
  
  I think this should not be possible and there should be checks to only allow 
values Q1985727 and Q1985786 as it is documented here: 
https://www.wikidata.org/wiki/Help:Dates#Time_datatype

TASK DETAIL
  https://phabricator.wikimedia.org/T360859

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Aklapper, Mitar, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-tech] Timestamps with calendarmodel other than Q1985727 and Q1985786

2024-03-02 Thread Mitar
Hi!

Recently, a timestamp with calendarmodel
https://www.wikidata.org/wiki/Q12138 has been introduced into
Wikidata:

https://www.wikidata.org/w/index.php?title=Q105958428=2004936527

How is this possible? I thought that the only allowed values are
Q1985727 and Q1985786?


Mitar

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar
___
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org


Re: [go-nuts] Re: Measuring the total time of serving a request

2023-11-21 Thread Mitar
Hi!

On Mon, Nov 20, 2023 at 10:51 PM Uli Kunitz  wrote:
> You could convert the original ResponseWriter to a ResponseController and 
> call Flush in your middleware before you measure the duration. Alternatively 
> you can try to convert ResponseWriter to a http.Flusher and call Flush if the 
> conversion is successful.

Yes, I was thinking something along those lines. But are there any
side effects by calling Flush after the main handler returns? Like in
Write documentation it says:

"if the total size of all written data is under a few KB and there are
no Flush calls, the Content-Length header is added automatically."

"Once the headers have been flushed (...), the request body may be unavailable."

My understanding is that after the main handler returns, none of this
matters anymore. But are there any other similar side effects?


Mitar

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikPFW%3Dh5AJnW1QbMB8%3DKjNKd%3Dook6HYkz%2BuajsXkyBFDPw%40mail.gmail.com.


Re: [go-nuts] Re: Measuring the total time of serving a request

2023-11-20 Thread Mitar
Hi!

On Mon, Nov 20, 2023 at 10:26 AM Duncan Harris  wrote:
> Why do you care about buffering in Go vs the OS?

Just because I hope that in Go I might have a chance to know when they
are written out than in OS.



Mitar

--
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikOnZJGAP9T4zFnWhk1COVjCGOGj9Za%2BmePEzSFz%2Bq1PdQ%40mail.gmail.com.


[go-nuts] Measuring the total time of serving a request

2023-11-19 Thread Mitar
Hi!

I would like to measure the total time of serving a request. From what
I was looking around, generally one could do that with a middleware
handler, which starts measuring time, calls ServeHTTP of the original
handler, and once ServeHTTP returns, measures the time it took.

But I am not completely convinced that this really takes into the
account full total time of serving a request (besides time spent
inside stdlib to pass the request to my code): ServeHTTP can call
Write which buffers data to be send and return, but I would like to
also take into the account total time it took to send this data out
(at least out of the program into the kernel). Am I right that this
time might not be captured with the approach above? Is there a way to
also measure the time it takes for buffers to be written out?

I am considering calling Flush in the middleware, before measuring the
end time. My understanding is that it would block until data is
written out. But I am not sure if that would have some other
unintended side effects?

Maybe measuring that extra time is not important in practice? I am
thinking of trying to measure it because it might be that the client's
connection is slow (maybe by malice) and would like to have some data
on how often that is happening.


Mitar

-- 
https://mitar.tnode.com/
https://twitter.com/mitar_m
https://noc.social/@mitar

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikNMmJ3GLuec8dO9p%3DG4svSgFkOqRVh1G%2BL4%2BC-hKTKoeA%40mail.gmail.com.


[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2023-05-13 Thread Mitar
Mitar added a comment.


  Awesome! Thanks. This looks really amazing. I am not too convinced that we 
should introduce a different dump format, but changing compression seems to 
really be a low hanging fruit.

TASK DETAIL
  https://phabricator.wikimedia.org/T222985

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Sascha, Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, 
bennofs, Busfault, Astuthiodit_1, Atieno, karapayneWMDE, Invadibot, 
maantietaja, jannee_e, ItamarWMDE, Akuckartz, holger.knust, Nandana, Lahi, 
Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331, Hokwelum
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2023-05-08 Thread Mitar
Mitar added a comment.


  I think it would be useful to have a benchmark with more options: JSON with 
gzip, bzip (decompressed with lbzip2), and zstd. And then for QuickStatements 
the same. Could you do that?

TASK DETAIL
  https://phabricator.wikimedia.org/T222985

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Sascha, Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, 
bennofs, Busfault, Astuthiodit_1, Atieno, karapayneWMDE, Invadibot, 
maantietaja, jannee_e, ItamarWMDE, Akuckartz, holger.knust, Nandana, Lahi, 
Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331, Hokwelum
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2023-02-13 Thread Mitar
Hi!

Done: https://bugs.chromium.org/p/chromium/issues/detail?id=1415291


Mitar

On Wed, Nov 2, 2022 at 11:46 PM Mike Taylor  wrote:
>
> Hi Mitar,
>
> This is really good feedback. Would you mind filing a bug at
> crbug.com/new? Feel free to respond here with the link.
>
> thanks,
> Mike
>
> On 11/1/22 11:25 AM, Mitar wrote:
> > Hi!
> >
> > After playing more with link preload header, I have found another
> > issue where it fails flat as a replacement for server push: where
> > response type depends on Accept header (e.g., APIs which can return
> > XML or JSON depending on Accept header, i.e., content negotiation).
> > You cannot really specify what is used as an Accept header with link
> > rel="preload" and as="fetch" and it looks like Chrome always sets
> > Accept: */*, even if you specify type="application/json".
> >
> >
> > Mitar
> >
> > On Sun, Jul 10, 2022 at 8:04 PM Mitar  wrote:
> >> Hi!
> >>
> >> On Sun, Jul 10, 2022 at 4:19 PM Patrick Meenan  
> >> wrote:
> >>> Presumably if you are protecting something behind HTTP Auth then the PUSH 
> >>> is happening AFTER the 401 challenge and the browser has made the 
> >>> follow-on request with the Authorization header.  If not, then the 
> >>> resources aren't actually protected by authorization and PUSH is 
> >>> bypassing the auth.
> >> I am not talking about HTTP Auth, but a simple SPA which fetches JSON
> >> data from REST API using Authorization header. The server-side can
> >> know that after loading the code, SPA will contact the REST API, so
> >> with HTTP2 you can push the JSON data to the client (setting
> >> Authorization header. as an expected header to be used in the request,
> >> and because it knows the logic of SPA it can also know the value of
> >> the Authorization header). This mitigates the question about whether
> >> you should embed this JSON into HTML response or provide it through
> >> API. Because you can push, there is little difference, but it can be
> >> seen as cleaner to separate HTML payload from data payload.
> >>
> >>> In the early-hints case, the 103 should presumably also be after the 
> >>> request has made it past HTTP Auth so any requests for subresources would 
> >>> use the cached credentials,
> >> That works with cookies, but not with REST API which uses bearer token
> >> in Authorization header. Or am I mistaken?
> >>
> >>> Content negotiation works fine for "as" types that have "accept" types 
> >>> associated with them.
> >> There are some issues opened for Chromium around that [1] [2], but
> >> maybe that is just implementation bug?
> >>
> >>> The PUSH case doesn't support custom authorization or content negotiation 
> >>> either.
> >> You can provide Authorization header in request headers in PUSH_PROMISE 
> >> frame?
> >>
> >>> PUSH is going to go away, the only real question is how long it will 
> >>> take, not if enough edge cases are found to keep it.
> >> I think I understand that. But I am trying to understand what are
> >> alternatives for the edge cases I care about. Initially it looked like
> >> Early Hints is the proposed alternative, but it looks like it does not
> >> support all use cases.
> >>
> >> Currently to me it looks like the best bet is to move the bearer token
> >> to Cookie header. That one might be included when doing a preload
> >> through Link header.
> >>
> >> [1] https://bugs.chromium.org/p/chromium/issues/detail?id=962642
> >> [2] https://bugs.chromium.org/p/chromium/issues/detail?id=1072144
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
> >
> >
>


-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKLmikP19rmngMvjxC-n4cGrSYbMShXdvTYJzTT0Mi019pJwdg%40mail.gmail.com.


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2022-11-01 Thread Mitar
Hi!

After playing more with link preload header, I have found another
issue where it fails flat as a replacement for server push: where
response type depends on Accept header (e.g., APIs which can return
XML or JSON depending on Accept header, i.e., content negotiation).
You cannot really specify what is used as an Accept header with link
rel="preload" and as="fetch" and it looks like Chrome always sets
Accept: */*, even if you specify type="application/json".


Mitar

On Sun, Jul 10, 2022 at 8:04 PM Mitar  wrote:
>
> Hi!
>
> On Sun, Jul 10, 2022 at 4:19 PM Patrick Meenan  wrote:
> > Presumably if you are protecting something behind HTTP Auth then the PUSH 
> > is happening AFTER the 401 challenge and the browser has made the follow-on 
> > request with the Authorization header.  If not, then the resources aren't 
> > actually protected by authorization and PUSH is bypassing the auth.
>
> I am not talking about HTTP Auth, but a simple SPA which fetches JSON
> data from REST API using Authorization header. The server-side can
> know that after loading the code, SPA will contact the REST API, so
> with HTTP2 you can push the JSON data to the client (setting
> Authorization header. as an expected header to be used in the request,
> and because it knows the logic of SPA it can also know the value of
> the Authorization header). This mitigates the question about whether
> you should embed this JSON into HTML response or provide it through
> API. Because you can push, there is little difference, but it can be
> seen as cleaner to separate HTML payload from data payload.
>
> > In the early-hints case, the 103 should presumably also be after the 
> > request has made it past HTTP Auth so any requests for subresources would 
> > use the cached credentials,
>
> That works with cookies, but not with REST API which uses bearer token
> in Authorization header. Or am I mistaken?
>
> > Content negotiation works fine for "as" types that have "accept" types 
> > associated with them.
>
> There are some issues opened for Chromium around that [1] [2], but
> maybe that is just implementation bug?
>
> > The PUSH case doesn't support custom authorization or content negotiation 
> > either.
>
> You can provide Authorization header in request headers in PUSH_PROMISE frame?
>
> > PUSH is going to go away, the only real question is how long it will take, 
> > not if enough edge cases are found to keep it.
>
> I think I understand that. But I am trying to understand what are
> alternatives for the edge cases I care about. Initially it looked like
> Early Hints is the proposed alternative, but it looks like it does not
> support all use cases.
>
> Currently to me it looks like the best bet is to move the bearer token
> to Cookie header. That one might be included when doing a preload
> through Link header.
>
> [1] https://bugs.chromium.org/p/chromium/issues/detail?id=962642
> [2] https://bugs.chromium.org/p/chromium/issues/detail?id=1072144
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKLmikN%2Buz3hv44Sy0E8%3DyPAzkzkmPYtmiLjpSAEWSNp%3D2yg8g%40mail.gmail.com.


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2022-07-10 Thread Mitar
Hi!

On Sun, Jul 10, 2022 at 4:19 PM Patrick Meenan  wrote:
> Presumably if you are protecting something behind HTTP Auth then the PUSH is 
> happening AFTER the 401 challenge and the browser has made the follow-on 
> request with the Authorization header.  If not, then the resources aren't 
> actually protected by authorization and PUSH is bypassing the auth.

I am not talking about HTTP Auth, but a simple SPA which fetches JSON
data from REST API using Authorization header. The server-side can
know that after loading the code, SPA will contact the REST API, so
with HTTP2 you can push the JSON data to the client (setting
Authorization header. as an expected header to be used in the request,
and because it knows the logic of SPA it can also know the value of
the Authorization header). This mitigates the question about whether
you should embed this JSON into HTML response or provide it through
API. Because you can push, there is little difference, but it can be
seen as cleaner to separate HTML payload from data payload.

> In the early-hints case, the 103 should presumably also be after the request 
> has made it past HTTP Auth so any requests for subresources would use the 
> cached credentials,

That works with cookies, but not with REST API which uses bearer token
in Authorization header. Or am I mistaken?

> Content negotiation works fine for "as" types that have "accept" types 
> associated with them.

There are some issues opened for Chromium around that [1] [2], but
maybe that is just implementation bug?

> The PUSH case doesn't support custom authorization or content negotiation 
> either.

You can provide Authorization header in request headers in PUSH_PROMISE frame?

> PUSH is going to go away, the only real question is how long it will take, 
> not if enough edge cases are found to keep it.

I think I understand that. But I am trying to understand what are
alternatives for the edge cases I care about. Initially it looked like
Early Hints is the proposed alternative, but it looks like it does not
support all use cases.

Currently to me it looks like the best bet is to move the bearer token
to Cookie header. That one might be included when doing a preload
through Link header.

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=962642
[2] https://bugs.chromium.org/p/chromium/issues/detail?id=1072144


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKLmikNd%3DpMSYN1XXA_Zed0%3DBUzmOcuMm1BqruaMS7D6ap7m1w%40mail.gmail.com.


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2022-07-10 Thread Mitar
Hi!

On Mon, Apr 25, 2022 at 9:09 AM Kenji Baheux 
wrote:

> The Authorization header should be supported in Early Hints.
> Please share a concrete example if this doesn't work as you'd hope.
>

Kenji, I think there is some misunderstanding here between what I am
concerned about (what is possible with HTTP2 push and not with Early Hints).

So with Early Hints Authorization header is supported in the following way:

HTTP/1.1 103 Early Hints
Link: /api/data.json; rel=preload; as=fetch
Authorization: Bearer foobar

But this does not mean that the browser will use that header when doing a
request to preload the /api/data.json. So preloading resources which
require authorization is not possible with Early Hints. But it is possible
with HTTP2 push.

Similar and related issue is with Accept header and content negotiation. It
is not possible to define a Link header which would for example request
application/json response, when multiple responses are possible. This has
been reported independently at [1].

So to me it looks like Early Hints support only simple public requests, no
authorization, no content negotiation. As such they are not suitable to
preload data from API endpoints. While HTTP2 push can support such use
cases.

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=962642

Am I missing something obvious about the Link header which would address
those concerns?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKLmikN3SaqDDXgQrS%2B8mqk717FnvMfudjtYn7%3DKLq3Snt777w%40mail.gmail.com.


[Wikidata-bugs] [Maniphest] T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps

2022-06-24 Thread Mitar
Mitar closed this task as "Resolved".
Mitar claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T278031

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: ImreSamu, Addshore, Mitar, Aklapper, Busfault, Astuthiodit_1, 
karapayneWMDE, Invadibot, Universal_Omega, maantietaja, jannee_e, ItamarWMDE, 
Akuckartz, darthmon_wmde, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, 
Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, Hokwelum
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps

2022-06-24 Thread Mitar
Mitar added a comment.


  I checked `wikidata-20220620-all.json.bz2` and it contains now `modified` 
field (alongside other fields which are present in API).

TASK DETAIL
  https://phabricator.wikimedia.org/T278031

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: ImreSamu, Addshore, Mitar, Aklapper, Busfault, Astuthiodit_1, 
karapayneWMDE, Invadibot, Universal_Omega, maantietaja, jannee_e, ItamarWMDE, 
Akuckartz, darthmon_wmde, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, 
Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, Hokwelum
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2022-04-12 Thread Mitar
Hi!

Kenji, I have been following progress around HTTP2 push and it looks to me
that all the focus is on HTTP2 pushes which happen at the page loading
time. But there are also pushes which happen later on, after the page has
already loaded. Concrete example: I have an app which subscribes to
notifications from the server (using SSE or WebSockets), when it gets a
notification that some resource changed, the app makes a HTTP request to
fetch the new resource (to then update DOM reactively). By separating
notification from payload and not pushing data itself over WebSocket, those
resources can be cached so that the next time the page loads they can be
reused. With HTTP2 push the server can send the resource to the client
together with the notification itself, so that data is available together
with the notification. I do not see something like possible with Early
Hints?

Another limitation of Early Hints seems to be that resources which require
Authorization header cannot be preloaded, am I mistaken? With HTTP2 push
you can push such a resource and add a corresponding anticipated header.


Mitar

On Wed, Mar 16, 2022 at 7:05 AM Kenji Baheux 
wrote:

> Hi Thomas,
>
> I'm part of the team working on Early Hints.
>
> On Wed, Mar 16, 2022 at 6:58 AM BIANCONI Thomas <
> thomas.bianc...@loreal.com> wrote:
>
>> I am sad to read this...
>> A new step before the deprecation of server push.
>>
>> I would love to see comparaison in term of performance between server
>> push and early hint.
>> On a pure theoric point of view early hint starts during the html parsing
>> whereas the server push start with the response header. So server push by
>> design is better.
>>
>
> If I understand correctly, I believe that there is some misunderstanding
> about Early Hints.
> Clearly, it's on us to make this easier to understand. Sorry...
> We'll put extra efforts in providing clear & detailed developer
> documentation when Early Hints ships.
>
> In the meantime, here is a high level summary of what Early Hints is, and
> how it works:
>
>1. Early Hints is a status code (103) which is used in HTTP responses
>while the server is preparing the final response. This intermediate
>response can include other HTTP headers, in particular LINK REL headers
>such as preload or preconnect.
>2. In some cases, it can take time for the server to prepare the main
>response: accessing the DB, having an edge cache go talk to the origin
>server, etc. So, the idea is to speed up overall page load times by giving
>the browser hints about what it might do while waiting for the actual
>response. Typically, the hints are about critical sub-resources or origins
>which would be used by the final response.
>3. The browser processes these hints, and decides to preconnect or
>preload any missing origins/resources while waiting for the final 200 OK
>response (usually containing the main resource). Since the browser got some
>work done ahead of time, the overall page load time is improved.
>
> In other words, the key point here is that Early Hints doesn't start
> during the HTML parsing: it starts with the non-final response headers, way
> before HTML parsing kicks in since that is blocked on the final response.
>
> See this section of the RFC
> <https://datatracker.ietf.org/doc/html/rfc8297#:~:text=The%20following%20example%20illustrates%20a%20typical%20message%20exchange%20that%0A%20%20%20involves%20a%20103%20(Early%20Hints)%20response.>
> for an example of how this looks at the HTTP level.
>
>
>
>
>> Regarding the complexity to put it in place early hints is easy when you
>> serve different page but for Single Page Application the build process
>> don't generate differentiate serving based on the route since the routing
>> of the application is generally managed in the frontend.
>> So for Single Page Application to managed server push not global to all
>> route it will more complexe to include it in the build process.
>>
>
> The MPA angle is indeed easier, deployment wise.
> We'll look into the SPA case in more details including discussion with
> various framework authors.
>
> I hope this was useful.
>
>
>>
>> Just wanted to share my feeling about this whole topic.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Regards
>>
>>
>>
>> *Thomas BIANCONI*
>>
>> Head of Data Technologies
>>
>> & Data Privacy Champion
>>
>> Global CDMO Team
>>
>> 41 Rue Martre - 92110 Clichy
>>
>> *Mob* : +33 (0) 6 15 35 33 57 <+33%206%2015%2035%2033%2057>
>>
>> *Ph* : +33 (0) 1 47 56 45 95 <+33%201%2047%2056%204

[Xmldatadumps-l] Re: Missing pages/stale data in HTML dumps

2022-04-05 Thread Mitar
Hi!

Thanks for noticing and sharing. Another known issue with HTML dumps
is that it seems that categories and templates are not always
extracted: https://phabricator.wikimedia.org/T300124


Mitar

On Tue, Apr 5, 2022 at 12:59 PM Jan Berkel  wrote:
>
> Hello,
>
> just a heads-up for anyone using HTML dumps, apart from the missing 
> namespaces issue already mentioned on this list, there also seem to be entire 
> pages missing, and some of the included page data is outdated and does not 
> contain the latest changes. I have no idea how many pages are affected.
>
> phabricator ticket with more details: 
> https://phabricator.wikimedia.org/T305407
>
>  – Jan
> ___
> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


Re: [blink-dev] Re: Intent to Remove: HTTP/2 and gQUIC server push

2022-02-25 Thread Mitar
Hi!

So I finally got to experiment a bit with HTTP2 push, preload link
header, and 103 early hints and to compare them, especially in the
context of API endpoints (similar to Vulcain [1]). My observations
are:

Go does not yet support 103 Early hints (but PRs exist [2][3]). 103
Early hints seem to be very verbose if the main use case is to send
preload link headers: "rel=preload;" could be maybe just implied? And
if I want to specify "crossorigin=use-credentials" across multiple
links, it becomes repetitive. Maybe it would be simpler to have 103
Preload with slightly modified syntax which would allow one to group
preload links together based on attributes.

Using a regular preload link header seems to be acted upon only after
both HTTP headers and response body has been consumed by the browser.
So it seems to me that the whole 103 Early hints is just because
browsers do not process HTTP headers immediately when they receive
them (if I force flush them before even sending the response body).
Make me wonder if it would not be easier to standardize early
processing of HTTP headers and early acting upon link headers in there
instead of the whole 103 Early hints approach which requires new
support across the full HTTP stack. Am I missing something here? Why
do we even need 103 Early hints?

HTTP2 push is even in 2022 sadly lacking high quality support across
the stack and this makes it hard to use. Go does not support it in
HTTP client library [4] and doing HTTP pushes seems to be slow [5].
Nor Chrome nor Firefox provide any insights into HTTP pushes in
developer tools, so it is really hard to observe them and debug them
(timings, headers, have they been canceled by the browser, informing a
developer if HTTP push has not been used). With such bad support
across the stack I do not wonder anymore why people have a hard time
using HTTP pushes and using them effectively.

So my conclusion is: given how hard it is to get HTTP pushes to be
implemented across the stack, I worry that the same thing will happen
with 103 Early hints. E.g.: will they be and how will they be exposed
in developer tools? How will one be able to observe when they are in
effect and when they are not? I wonder if in 5 years we will not be at
the same point thinking about removing 103 Early hints. Maybe simply
do early processing of HTTP headers and acting upon any link headers
in there might be much simpler to both implement and expose in
developer tools in browsers (almost no work needed there, only
resources will be fetched earlier in timelines). And no need for any
changes to the rest of the HTTP stack.

[1] https://github.com/dunglas/vulcain
[2] https://github.com/golang/go/pull/42597
[3] https://github.com/golang/net/pull/96
[4] https://github.com/golang/go/issues/18594
[5] https://github.com/golang/go/issues/51361


Mitar



Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion on the web visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAKLmikPu72BG56Jng13iu-JLQ5iw4v97D%3DvhmaP5-pF4JT_ydw%40mail.gmail.com.


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-09 Thread Mitar
Hi!

I made this ticket [1] to track regaining access to metadata as a dump.

[1] https://phabricator.wikimedia.org/T301039


Mitar

On Tue, Feb 8, 2022 at 2:32 AM Platonides  wrote:
>
> The metadata used to be included in the image table, but it was changed 6 
> months ago out to External Storage. See 
> https://phabricator.wikimedia.org/T275268#7178983
>
>
> On Fri, 4 Feb 2022 at 20:44, Mitar  wrote:
>>
>> Hi!
>>
>> Will do. Thanks.
>>
>> After going through the image table dump, it seems not all data is in
>> there. For example, page count for Djvu files is missing. Instead of
>> metadata in the image table dump, a reference to text table [1] is
>> provided:
>>
>> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>>
>> But that table itself does not seem to be available as a dump? Or am I
>> missing something or misunderstanding something?
>>
>> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>>
>>
>> Mitar
>>
>> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF  wrote:
>> >
>> > This looks great! If you like, you might add the link and a  brief 
>> > description to this page: 
>> > https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more 
>> > people can find and use the library :-)
>> >
>> > (Anyone else have tools they wrote and use, that aren't on this list? 
>> > Please add them!)
>> >
>> > Ariel
>> >
>> > On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
>> >>
>> >> Hi!
>> >>
>> >> If it is useful to anyone else, I have added to my library [1] in Go
>> >> for processing dumps support for processing SQL dumps directly,
>> >> without having to load them into a database. So one can process them
>> >> directly to extract data, like dumps in other formats.
>> >>
>> >> [1] https://gitlab.com/tozd/go/mediawiki
>> >>
>> >>
>> >> Mitar
>> >>
>> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>> >> >
>> >> > Hi!
>> >> >
>> >> > I see. Thanks.
>> >> >
>> >> >
>> >> > Mitar
>> >> >
>> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  
>> >> > wrote:
>> >> > >
>> >> > > The media/file descriptions contained in the dump are the wikitext of 
>> >> > > the revisions of pages with the File: prefix, plus the metadata about 
>> >> > > those pages and revisions (user that made the edit, timestamp of 
>> >> > > edit, edit comment, and so on).
>> >> > >
>> >> > > Width and hieght of the image, the media type, the sha1 of the image 
>> >> > > and a few other details can be obtained by looking at the 
>> >> > > image.sql.gz file available for download for the dumps for each wiki. 
>> >> > > Have a look at https://www.mediawiki.org/wiki/Manual:Image_table for 
>> >> > > more info.
>> >> > >
>> >> > > Hope that helps!
>> >> > >
>> >> > > Ariel Glenn
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>> >> > >>
>> >> > >> Hi!
>> >> > >>
>> >> > >> I am trying to find a dump of all imageinfo data [1] for all files on
>> >> > >> Commons. I thought that "Articles, templates, media/file 
>> >> > >> descriptions,
>> >> > >> and primary meta-pages" XML dump would contain that, given the
>> >> > >> "media/file descriptions" part, but it seems this is not the case. Is
>> >> > >> there a dump which contains that information? And what is "media/file
>> >> > >> descriptions" then? Wiki pages of files?
>> >> > >>
>> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>> >> > >>
>> >> > >>
>> >> > >> Mitar
>> >> > >>
>> >> > >> --
>> >> > >> http://mitar.tnode.com/
>> >> > >> https://twitter.com/mitar_m
>> >> > >> ___
>> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> >> > >> To unsubscribe send an email to 
>> >> > >> xmldatadumps-l-le...@lists.wikimedia.org
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > http://mitar.tnode.com/
>> >> > https://twitter.com/mitar_m
>> >>
>> >>
>> >>
>> >> --
>> >> http://mitar.tnode.com/
>> >> https://twitter.com/mitar_m
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-04 Thread Mitar
Hi!

Will do. Thanks.

After going through the image table dump, it seems not all data is in
there. For example, page count for Djvu files is missing. Instead of
metadata in the image table dump, a reference to text table [1] is
provided:

{"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}

But that table itself does not seem to be available as a dump? Or am I
missing something or misunderstanding something?

[1] https://www.mediawiki.org/wiki/Manual:Text_table


Mitar

On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF  wrote:
>
> This looks great! If you like, you might add the link and a  brief 
> description to this page: 
> https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more people 
> can find and use the library :-)
>
> (Anyone else have tools they wrote and use, that aren't on this list? Please 
> add them!)
>
> Ariel
>
> On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
>>
>> Hi!
>>
>> If it is useful to anyone else, I have added to my library [1] in Go
>> for processing dumps support for processing SQL dumps directly,
>> without having to load them into a database. So one can process them
>> directly to extract data, like dumps in other formats.
>>
>> [1] https://gitlab.com/tozd/go/mediawiki
>>
>>
>> Mitar
>>
>> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>> >
>> > Hi!
>> >
>> > I see. Thanks.
>> >
>> >
>> > Mitar
>> >
>> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
>> > >
>> > > The media/file descriptions contained in the dump are the wikitext of 
>> > > the revisions of pages with the File: prefix, plus the metadata about 
>> > > those pages and revisions (user that made the edit, timestamp of edit, 
>> > > edit comment, and so on).
>> > >
>> > > Width and hieght of the image, the media type, the sha1 of the image and 
>> > > a few other details can be obtained by looking at the image.sql.gz file 
>> > > available for download for the dumps for each wiki. Have a look at 
>> > > https://www.mediawiki.org/wiki/Manual:Image_table for more info.
>> > >
>> > > Hope that helps!
>> > >
>> > > Ariel Glenn
>> > >
>> > >
>> > >
>> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>> > >>
>> > >> Hi!
>> > >>
>> > >> I am trying to find a dump of all imageinfo data [1] for all files on
>> > >> Commons. I thought that "Articles, templates, media/file descriptions,
>> > >> and primary meta-pages" XML dump would contain that, given the
>> > >> "media/file descriptions" part, but it seems this is not the case. Is
>> > >> there a dump which contains that information? And what is "media/file
>> > >> descriptions" then? Wiki pages of files?
>> > >>
>> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>> > >>
>> > >>
>> > >> Mitar
>> > >>
>> > >> --
>> > >> http://mitar.tnode.com/
>> > >> https://twitter.com/mitar_m
>> > >> ___
>> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> > >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>> >
>> >
>> >
>> > --
>> > http://mitar.tnode.com/
>> > https://twitter.com/mitar_m
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

If it is useful to anyone else, I have added to my library [1] in Go
for processing dumps support for processing SQL dumps directly,
without having to load them into a database. So one can process them
directly to extract data, like dumps in other formats.

[1] https://gitlab.com/tozd/go/mediawiki


Mitar

On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>
> Hi!
>
> I see. Thanks.
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
> >
> > The media/file descriptions contained in the dump are the wikitext of the 
> > revisions of pages with the File: prefix, plus the metadata about those 
> > pages and revisions (user that made the edit, timestamp of edit, edit 
> > comment, and so on).
> >
> > Width and hieght of the image, the media type, the sha1 of the image and a 
> > few other details can be obtained by looking at the image.sql.gz file 
> > available for download for the dumps for each wiki. Have a look at 
> > https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >
> > Hope that helps!
> >
> > Ariel Glenn
> >
> >
> >
> > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> I am trying to find a dump of all imageinfo data [1] for all files on
> >> Commons. I thought that "Articles, templates, media/file descriptions,
> >> and primary meta-pages" XML dump would contain that, given the
> >> "media/file descriptions" part, but it seems this is not the case. Is
> >> there a dump which contains that information? And what is "media/file
> >> descriptions" then? Wiki pages of files?
> >>
> >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
> >> ___
> >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

I see. Thanks.


Mitar

On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
>
> The media/file descriptions contained in the dump are the wikitext of the 
> revisions of pages with the File: prefix, plus the metadata about those pages 
> and revisions (user that made the edit, timestamp of edit, edit comment, and 
> so on).
>
> Width and hieght of the image, the media type, the sha1 of the image and a 
> few other details can be obtained by looking at the image.sql.gz file 
> available for download for the dumps for each wiki. Have a look at 
> https://www.mediawiki.org/wiki/Manual:Image_table for more info.
>
> Hope that helps!
>
> Ariel Glenn
>
>
>
> On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>>
>> Hi!
>>
>> I am trying to find a dump of all imageinfo data [1] for all files on
>> Commons. I thought that "Articles, templates, media/file descriptions,
>> and primary meta-pages" XML dump would contain that, given the
>> "media/file descriptions" part, but it seems this is not the case. Is
>> there a dump which contains that information? And what is "media/file
>> descriptions" then? Wiki pages of files?
>>
>> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>>
>>
>> Mitar
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Access imageinfo data in a dump

2022-02-02 Thread Mitar
Hi!

I am trying to find a dump of all imageinfo data [1] for all files on
Commons. I thought that "Articles, templates, media/file descriptions,
and primary meta-pages" XML dump would contain that, given the
"media/file descriptions" part, but it seems this is not the case. Is
there a dump which contains that information? And what is "media/file
descriptions" then? Wiki pages of files?

[1] https://www.mediawiki.org/wiki/API:Imageinfo


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T174029: Two kinds of JSON dumps?

2022-01-26 Thread Mitar
Mitar added a comment.


  I would vote for simply including hashes in dumps. They would make dumps 
bigger, but they would be consistent with output of `EntityData` which 
currently includes hashes for all snaks.

TASK DETAIL
  https://phabricator.wikimedia.org/T174029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, Lydia_Pintscher, aude, WMDE-leszek, thiemowmde, ArielGlenn, hoo, 
daniel, Addshore, Lucas_Werkmeister_WMDE, Aklapper, Invadibot, maantietaja, 
Akuckartz, Dinadineke, DannyS712, Nandana, lucamauri, tabish.shaikh91, Lahi, 
Gq86, GoranSMilovanovic, Jayprakash12345, JakeTheDeveloper, QZanden, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, TheDJ, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T171607: Main snak and reference snaks do not include hash in JSON output

2022-01-26 Thread Mitar
Mitar added a comment.


  Just a followup from somebody coming to Wikidata dumps in 2021: it is really 
confusing that dumps do not include hashes, especially because `EntityData` 
seems to show them now for all snaks (main, qualifiers, references). So when 
one is debugging this, using `EntityData` as a reference throws you of.
  
  I would vote for inclusion of hashes in dumps as well. I think this is 
backwards compatible change as it just adds to existing data in there. Having 
hashes in all snaks makes it easier to have a reference with which you can 
point back to a snak in your own system (or UI).

TASK DETAIL
  https://phabricator.wikimedia.org/T171607

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde, Mitar
Cc: Mitar, Lydia_Pintscher, WMDE-leszek, thiemowmde, gerritbot, Addshore, 
Aklapper, daniel, Lucas_Werkmeister_WMDE, PokestarFan, Invadibot, maantietaja, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata] Re: Timezone, before, and after fields in JSON dump

2022-01-10 Thread Mitar
Hi!

On Mon, Jan 10, 2022 at 4:50 PM Lydia Pintscher
 wrote:
> Thanks for checking. Do you have a few examples so we can have a closer look?

There are many. Few cases (many seems to about a reference with P813):

Q5198 P21 reference 1 snaks P813 0: timezone 60
Q5664 P20 reference 1 snaks P813 0: timezone 60
Q5721 P106 reference 0 snaks P813 0: timezone 60
Q5869 P194 reference 0 snaks P813 0: timezone -5
Q5826 P194 reference 0 snaks P813 0: timezone -5
Q5816 P106 reference 0 snaks P813 0: timezone 60
Q11618 P4632 reference 0 snaks P813 0: before 1
Q12018 P4632 reference 0 snaks P813 0: before 1
Q12773 P106 reference 0 snaks P813 0: timezone 60
Q12773 P106 reference 0 snaks P813 0: timezone 60
Q13283 P1999 reference 0 snaks P813 0: after 1
Q13293 P1999 reference 0 snaks P813 0: after 1
Q13307 P1999 reference 0 snaks P813 0: after 1
Q13334 P355 reference 0 snaks P813 0: before 1
Q13353 P2853 reference 0 snaks P813 0: timezone 120
Q13361 P1999 reference 0 snaks P813 0: after 1
Q14430 P106 reference 0 snaks P813 0: timezone 60
Q14524 P106 reference 1 snaks P813 0: timezone 60
Q15174 P194 reference 0 snaks P813 0: timezone -5
Q16019 P21 reference 0 snaks P813 0: timezone 60
Q16285 P106 reference 1 snaks P813 0: timezone 60
Q16285 P106 reference 1 snaks P813 0: timezone 60
Q16389 P106 reference 0 snaks P813 0: timezone 60
Q16403 P4632 reference 0 snaks P813 0: before 1
Q16572 P194 reference 0 snaks P813 0: timezone -5
Q16967 P194 reference 0 snaks P813 0: timezone -5
Q18809 P106 reference 0 snaks P813 0: timezone 60
Q18809 P106 reference 0 snaks P813 0: timezone 60
Q19214 P1001 reference 0 snaks P813 0: timezone -5
Q20456 P4632 reference 0 snaks P813 0: before 1
Q22432 P4632 reference 0 snaks P813 0: before 1

You cannot see it in web UI, but you can see them in JSON, e.g.:

https://www.wikidata.org/wiki/Special:EntityData/Q5198.json

Few, which are not related to P813 are:

Q28287 P2046 qualifier P585: timezone 1
Q38573 P166 qualifier P585: after 1
Q54764 P2046 qualifier P585: timezone 1
Q82986 P580: after 1

There are really many of them. I can produce the whole list if you need that.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: +0000-00-00T00:00:00Z in JSON dump

2022-01-10 Thread Mitar
Hi!

I took some time and went over all cases I found and all of them were
simply bad data. I suspect that most of them were added using some
automatic way which was passing this timestamp in when there was no
data. So I cleaned them up or fixed them (in few cases the right value
was "unknown" with a range, in some cases it was 1 BCE, but in most
cases I just removed the claim because not only that it is false, it
simply invalid, it is not even a valid timestamp).

You can see examples in my recent changes [1].

At this point I would ask more about how this got in (why it is not
denied at insertion time) and even more interesting: the web UI does
not show any warning about those values. For many other cases you get
various warnings about possibly invalid data, but not here. So maybe
adding a warning that if such a timestamp is a value, a warning should
be shown next to it, that would be great. Of course, even better would
be to prevent insertion (because in 99% it means somebody is blindly
inserting a default zero value).

[1] 
https://www.wikidata.org/w/index.php?title=Special:Contributions/Mitar==500=Mitar


Mitar

On Mon, Jan 10, 2022 at 4:50 PM Lydia Pintscher
 wrote:
>
> Hey Mitar,
>
> Also here a few examples would help to better understand what's going on.
>
>
> Cheers
> Lydia
>
> On Sun, Jan 9, 2022 at 9:52 AM Mitar  wrote:
> >
> > Hi!
> >
> > I have been processing a recent Wikidata JSON dump. I have noticed
> > that some claims have +-00-00T00:00:00Z as the time value. My
> > understanding is that those are invalid values for time, at least
> > according to [1]. I think they can be safely removed, yes?
> >
> > [1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> > ___
> > Wikidata mailing list -- wikidata@lists.wikimedia.org
> > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
>
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Xmldatadumps-l] ANN: A Go package providing utilities for processing Wikipedia and Wikidata dumps

2022-01-09 Thread Mitar
Hi!

I just published the first version of a Go package which provides
utilities for processing
Wikidata entities JSON dumps and Wikimedia Enterprise HTML dumps. It
processes them in parallel on multiple cores, so processing is rather
fast. I hope it will be useful to others, too.

https://gitlab.com/tozd/go/mediawiki

Any feedback is welcome.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Wikitech-l] ANN: A Go package providing utilities for processing Wikipedia and Wikidata dumps

2022-01-09 Thread Mitar
Hi!

I just published the first version of a Go package which provides
utilities for processing
Wikidata entities JSON dumps and Wikimedia Enterprise HTML dumps. It
processes them in parallel on multiple cores, so processing is rather
fast. I hope it will be useful to others, too.

https://gitlab.com/tozd/go/mediawiki

Any feedback is welcome.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


[Wiki-research-l] ANN: A Go package providing utilities for processing Wikipedia and Wikidata dumps

2022-01-09 Thread Mitar
Hi!

I just published the first version of a Go package which provides
utilities for processing
Wikidata entities JSON dumps and Wikimedia Enterprise HTML dumps. It
processes them in parallel on multiple cores, so processing is rather
fast. I hope it will be useful to others, too.

https://gitlab.com/tozd/go/mediawiki

Any feedback is welcome.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wikidata] +0000-00-00T00:00:00Z in JSON dump

2022-01-09 Thread Mitar
Hi!

I have been processing a recent Wikidata JSON dump. I have noticed
that some claims have +-00-00T00:00:00Z as the time value. My
understanding is that those are invalid values for time, at least
according to [1]. I think they can be safely removed, yes?

[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Timezone, before, and after fields in JSON dump

2022-01-09 Thread Mitar
Hi!

I have been processing a recent Wikidata JSON dump. According to
documentation [1], time datavalue has timezone, before and after
fields, which are documented as currently not used. But I noticed that
in the dump some claims do have them set. What should be done about
them? Are they errors? Are they information? Can they be safely
ignored? Should those claims be updated in Wikidata to remove those
fields?

I can provide a list of those if anyone is interested.

[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


Re: [go-nuts] ANN: A new package to wrap errors and record a stack trace

2022-01-03 Thread Mitar
Hi!

On Mon, Jan 3, 2022 at 5:34 PM Gaurav Maheshwari
 wrote:
> 1. Do you think it would be better to expose methods which can allow to 
> control Stack depth, e.g. it is quite common to wrap external libraries in a 
> thin-proxy library in large codebases but dropping proxy layer from 
> stacktrace would be useful for readability. Similar to crdb's WrapWithDepthf ?

I think you do not want to control stack depth (which is hard-coded at
32), but the depth at which the stack starts. I must admit that one of
the reasons for this package was because I was first extending
github.com/pkg/errors by wrapping its functions, but then those
wrapper functions ended up in all stack traces. But this was a problem
because of me wrapping github.com/pkg/errors itself. But I do not
fully understand your use case through. Maybe open an issue in the
repository to discuss this further there? At first glance I do not see
an easy way to add such parameter to existing functions, but adding a
whole set of other functions might be an overkill. So it really
depends on what is your use case here.

> 2. Given stackTrace interface is not exported, how could a user access the 
> Stack trace associated with an error programmatically? The use-case I have in 
> mind is to configure structured logging such that stack trace associated with 
> an error goes in a  separate log key?

That is the same as in github.com/pkg/errors. In Go a package does not
have to export an interface for you to be able to use it, you just
define the same interface with same signatures in you own code. The
difference with github.com/pkg/errors is that the interface is defined
as:

type stackTracer interface {
StackTrace() []uintptr
}

So there is no custom type in the StackTrace signature which allows
one to define this interface without even having to import
gitlab.com/tozd/go/errors.

> Can you please explain this statement as well?
> > Makes sure a stack trace is not recorded multiple times unnecessarily.

github.com/pkg/errors's functions unconditionally recorded a stack
trace every time you called any of its methods, even when the error
you were wrapping already had a stack trace. Most functions in this
package add a stack trace only if the error does not already have one.
The only exception is Wrap which does it again.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikMJrphARNnwppZsMfsT5c0KJ-bmnCS-eHg5_i7ZHtaheQ%40mail.gmail.com.


Re: [go-nuts] ANN: A new package to wrap errors and record a stack trace

2022-01-03 Thread Mitar
Hi!

Yes, I have. My view is that both github.com/pkg/errors and
gitlab.com/tozd/go/errors aim at being more or less a drop-in
replacement for core errors, but adding stack traces to errors. Since
Go has now Errorf which can wrap existing errors to augment their
messages, the utility functions in github.com/pkg/errors and
gitlab.com/tozd/go/errors are not really necessary, but they still
make some common cases easy. So my view is that these two packages
could be something which could be used in a wide range of packages to
produce errors, when the package also wants to return a stack trace
with the error.

On the other hand, github.com/cockroachdb/errors has so many features
that to effecticelly use it one has to coordinate around the codebase
how exactly different features are used to get consistency. So it is
less useful in my opinion for 3rd party packages and more as something
you would use in your own large codebase (which you more or less
control). But it has all possible features you might ever want. Maybe
it could be cool if github.com/cockroachdb/errors would be built on
top of gitlab.com/tozd/go/errors, to use when you need all those
additional features.

Additionally, I am personally not too convinced about storing end-user
hints into errors themselves. Maybe that works for some people, but to
me errors are something useful to control program flow and to debug by
developers, while showing hints to end users is generally something
one has to do separately (e.g., because you have to translate that
hint anyway to the user's language). So to me a pattern I prefer, and
gitlab.com/tozd/go/errors enables, is to have a set of base errors
(e.g., io.EOF) which you then augment with the stack trace at the
point where they happen, and then when you are handling errors you use
`errors.Is` to determine which of base errors happened and map that to
a message for the end user, in their language. So in a way,
github.com/cockroachdb/errors has too much stuff for me, so I prefer
something leaner.


Mitar

On Mon, Jan 3, 2022 at 7:27 AM Gaurav Maheshwari
 wrote:
>
> Did you check https://github.com/cockroachdb/errors ? How does it differ from 
> cockroachdb/errors?
>
> On Mon, 3 Jan 2022 at 08:05, Mitar  wrote:
>>
>> Hi!
>>
>> I have recently published a new package which provides errors with a
>> stack trace, similar to now archived github.com/pkg/errors, but
>> attempting to address many issues the community has identified since
>> and updating it to new Go features. It is almost compatible with
>> github.com/pkg/errors codewise and very familiar human wise.
>>
>> Now you can use Errorf to both wrap an existing error, format the
>> error message, and record a stack trace.
>>
>> https://gitlab.com/tozd/go/errors
>>
>> Check it out. Any feedback is welcome.
>>
>>
>> Mitar
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/CAKLmikOVd%3DaV01QQf8xUE5vOxqarz7dsrit0V3pgT_XsaQuGow%40mail.gmail.com.
>
>
>
> --
> Gaurav Maheshwari



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikPuEWo9uWvXK_A7yq6F6%2BwLzV_hrrD7Kt1pfbPzWMqp6Q%40mail.gmail.com.


[go-nuts] ANN: A new package to wrap errors and record a stack trace

2022-01-02 Thread Mitar
Hi!

I have recently published a new package which provides errors with a
stack trace, similar to now archived github.com/pkg/errors, but
attempting to address many issues the community has identified since
and updating it to new Go features. It is almost compatible with
github.com/pkg/errors codewise and very familiar human wise.

Now you can use Errorf to both wrap an existing error, format the
error message, and record a stack trace.

https://gitlab.com/tozd/go/errors

Check it out. Any feedback is welcome.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikOVd%3DaV01QQf8xUE5vOxqarz7dsrit0V3pgT_XsaQuGow%40mail.gmail.com.


[Wikitech-l] Re: [Wiki-research-l] Wikimedia Enterprise HTML dumps available for public download

2022-01-02 Thread Mitar
Hi!

Thank you for the reply. I made the following tasks:

https://phabricator.wikimedia.org/T298436
https://phabricator.wikimedia.org/T298437


Mitar

On Sat, Jan 1, 2022 at 6:07 PM Ariel Glenn WMF  wrote:
>
> Hello Mitar! I'm glad you are finding the Wikimedia Enterprise dumps useful.
>
> For your tar.gz question, this is the format that the Wikimedia Enterprise 
> dataset consumers prefer, from what I understand. But I would suggest that if 
> you are interested in other formats, you might open a task on phabricator 
> with a feature request, and add  the Wikimedia Enterprise project tag ( 
> https://phabricator.wikimedia.org/project/view/4929/ ).
>
> As to the API, I'm only familiar with the endpoints for bulk download, so 
> you'll want to ask the Wikimedia Enterprise folks, or have a look at their 
> API documentation here: 
> https://www.mediawiki.org/wiki/Wikimedia_Enterprise/Documentation
>
> Ariel
>
>
> On Sat, Jan 1, 2022 at 4:30 PM Mitar  wrote:
>>
>> Hi!
>>
>> Awesome!
>>
>> Is there any reason they are tar.gz files of one file and not simply
>> bzip2 of the file contents? Wikidata dumps are bzip2 of one json and
>> that allows parallel decompression. Having both tar (why tar of one
>> file at all?) and gz in there really requires one to first decompress
>> the whole thing before you can process it in parallel. Is there some
>> other way I am missing?
>>
>> Wikipedia dumps are done with multistream bzip2 with an additional
>> index file. That could be nice here too, if one could have an index
>> file and then be able to immediately jump to a JSON line for
>> corresponding articles.
>>
>> Also, is there an API endpoint or Special page which can return the
>> same JSON for a single Wikipedia page? The JSON structure looks very
>> useful by itself (e.g., not in bulk).
>>
>>
>> Mitar
>>
>>
>> On Tue, Oct 19, 2021 at 4:57 PM Ariel Glenn WMF  wrote:
>> >
>> > I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
>> > October 17-18th are available for public download; see
>> > https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
>> > expect to make updated versions of these files available around the 1st/2nd
>> > of the month and the 20th/21st of the month, following the cadence of the
>> > standard SQL/XML dumps.
>> >
>> > This is still an experimental service, so there may be hiccups from time to
>> > time. Please be patient and report issues as you find them. Thanks!
>> >
>> > Ariel "Dumps Wrangler" Glenn
>> >
>> > [1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more
>> > about Wikimedia Enterprise and its API.
>> > ___
>> > Wiki-research-l mailing list -- wiki-researc...@lists.wikimedia.org
>> > To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


[Wikitech-l] Re: [Wiki-research-l] Wikimedia Enterprise HTML dumps available for public download

2022-01-01 Thread Mitar
Hi!

Awesome!

Is there any reason they are tar.gz files of one file and not simply
bzip2 of the file contents? Wikidata dumps are bzip2 of one json and
that allows parallel decompression. Having both tar (why tar of one
file at all?) and gz in there really requires one to first decompress
the whole thing before you can process it in parallel. Is there some
other way I am missing?

Wikipedia dumps are done with multistream bzip2 with an additional
index file. That could be nice here too, if one could have an index
file and then be able to immediately jump to a JSON line for
corresponding articles.

Also, is there an API endpoint or Special page which can return the
same JSON for a single Wikipedia page? The JSON structure looks very
useful by itself (e.g., not in bulk).


Mitar


On Tue, Oct 19, 2021 at 4:57 PM Ariel Glenn WMF  wrote:
>
> I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
> October 17-18th are available for public download; see
> https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
> expect to make updated versions of these files available around the 1st/2nd
> of the month and the 20th/21st of the month, following the cadence of the
> standard SQL/XML dumps.
>
> This is still an experimental service, so there may be hiccups from time to
> time. Please be patient and report issues as you find them. Thanks!
>
> Ariel "Dumps Wrangler" Glenn
>
> [1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more
> about Wikimedia Enterprise and its API.
> ___
> Wiki-research-l mailing list -- wiki-researc...@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org



--
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


[Wikidata-bugs] [Maniphest] T115223: Provide wikidata downloads as multiple files to make access more robust and efficient

2021-12-31 Thread Mitar
Mitar added a comment.


  I learned today that Wikipedia has a nice approach with a multistream bz2 
archive <https://dumps.wikimedia.org/enwiki/> and additional file with an 
index, which tells you an offset into the bz2 archive you have to decompress as 
a chunk to access particular page. Wikidata could do the same, just for items 
and properties. This would allow one to extract only those entities they care 
about. Mutlistream also enables one to decompress parts of the file in parallel 
on multiple machines, by distributing offsets between them. Wikipedia also 
provides the same multistream archive as multiple files so that one can even 
easier distribute the whole dump over multiple machines. I like that approach.

TASK DETAIL
  https://phabricator.wikimedia.org/T115223

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, Mitar, abian, JanZerebecki, Hydriz, hoo, Halfak, NealMcB, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


Re: [go-nuts] Is importing one main package in another main package supported?

2021-12-22 Thread Mitar
Hi!

On Tue, Dec 21, 2021 at 7:31 PM Ian Lance Taylor  wrote:
> Support for importing a main package was dropped from the go tool as
> part of https://golang.org/issue/4210.

I see. Thanks. The issue seems to be resolved in 2015, but the mailing
list post I referenced [1] is from December 2016. So I can assume the
post is simply false?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikP%2BbhEAKPkaUaT1Y%3DbCzn0pn01%3DN2gMO%3Dk%2BgsBn5m7eig%40mail.gmail.com.


[go-nuts] Is importing one main package in another main package supported?

2021-12-21 Thread Mitar
Hi!

I am trying to make a CLI tool which would have sub-commands and some
of them would call into other existing CLI tools (written in Go). The
issue is that those other commands sometimes have CLI parsing code as
part of the main package (e.g., example.com/cmd/example) and the hook
to call into the main (passing a list of arguments) is also in the
main package. I do not want to use a subprocess because that would
mean that a user would have to install those other CLI tools, too.

I understand why it is not reasonable to import main packages into
non-main packages, but my understanding is that importing a main
package into another main package should work, at least according to
[1]. But I cannot get it to work.

So is [1] wrong or has that changed since then?

Is there some other way to get `go install`ed tool to define a
dependency on another `go install`ed tool so that both are installed
as CLI tools?


Mitar

[1] https://groups.google.com/g/golang-nuts/c/frh9zQPEjUk/m/9tnVPAegDgAJ

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKLmikOosfu7Cj1NtFBeunMiBfcXhUkYiTx_Z2A05qg%3DyAemfg%40mail.gmail.com.


Re: Determining if a table really changed in a trigger

2021-11-06 Thread Mitar
Hi!

On Sat, Nov 6, 2021 at 2:43 PM Tom Lane  wrote:
> Mitar  writes:
> > Anyone? Any way to determine the number of affected rows in a statement 
> > trigger?
>
> Check the size of the transition relation.

Yes, this is what we are currently doing, but it looks very
inefficient if you want just the number, no? Or even if you want to
know if it is non-zero or zero.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-11-06 Thread Mitar
Hi!

On Wed, Oct 27, 2021 at 12:46 AM Mark Dilger
 wrote:
> I felt the same way about it, but after glancing quickly through the code and 
> docs nothing jumped out.  The information is clearly available, as it gets 
> returned at the end of the UPDATE statement in the "UPDATE 0" OR "UPDATE 3", 
> but I don't see how to access that from the trigger.  I might have to submit 
> a patch for that if nobody else knows a way to get it.  (Hopefully somebody 
> will respond with the answer...?)

Anyone? Any way to determine the number of affected rows in a statement trigger?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-27 Thread Mitar
Hi!

On Wed, Oct 27, 2021 at 12:56 PM Marcos Pegoraro  wrote:
>> Oh, very interesting. I thought that this is not possible because WHEN
>> condition on triggers does not have NEW and OLD. But this is a very
>> cool way to combine rules with triggers, where a rule can still
>> operate by row.
>
> That is not true

Sorry to be imprecise. In this thread I am interested in statement
triggers, so I didn't mention this explicitly here. So statement
triggers do not have NEW and OLD. But you can combine it with a
row-level rule and this works then well together.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-27 Thread Mitar
Hi!

On Wed, Oct 27, 2021 at 3:56 AM Michael Lewis  wrote:
> If you end up with no rows changing from an insert or delete, something seems 
> awry. Unless you mean 0 rows affected.

Isn't this the same? Isn't the number of rows affected the same as the
number of rows changing? For example:

DELETE FROM my_table where i=100;

would not change anything in your example. But probably this is just
terminology I have used badly.

> Do after statement triggers still execute? I suppose they very well might.

I have run the following and it seems statement triggers still execute
even if nothing changes:

postgres=# create table my_table (i integer, j json);
CREATE TABLE
postgres=# insert into my_table
  select gs::integer, '{"key":1}'::json
from generate_series(1,3) gs;
INSERT 0 3
postgres=# create function my_table_func () returns trigger as $$
declare
  have_rows boolean;
begin
  raise warning 'trigger called';
  if (tg_op = 'INSERT') then
select true into have_rows from new_values limit 1;
if have_rows then
  raise warning 'rows have changed';
end if;
  elsif (tg_op = 'UPDATE' or tg_op = 'DELETE') then
select true into have_rows from old_values limit 1;
if have_rows then
  raise warning 'rows have changed';
end if;
  end if;
  return null;
end
$$ language plpgsql;
CREATE FUNCTION
postgres=# create trigger my_table_trig_insert after insert on my_table
  referencing new table as new_values
  for each statement
  execute function my_table_func();
CREATE TRIGGER
postgres=# create trigger my_table_trig_update after update on my_table
  referencing old table as old_values
  for each statement
  execute function my_table_func();
CREATE TRIGGER
postgres=# create trigger my_table_trig_delete after delete on my_table
  referencing old table as old_values
  for each statement
  execute function my_table_func();
CREATE TRIGGER
postgres=# update my_table set j = '{"key":2}'::jsonb;
WARNING:  trigger called
WARNING:  rows have changed
UPDATE 3
postgres=# update my_table set j = '{"key":2}'::jsonb;
WARNING:  trigger called
WARNING:  rows have changed
UPDATE 3
postgres=# create trigger z_min_update
  before update on my_table
  for each row execute function suppress_redundant_updates_trigger();
CREATE TRIGGER
postgres=# update my_table set j = '{"key":2}'::jsonb;
WARNING:  trigger called
UPDATE 0
postgres=# update my_table set j = '{"key":3}'::jsonb;
WARNING:  trigger called
WARNING:  rows have changed
UPDATE 3
postgres=# delete from my_table where i = 100;
WARNING:  trigger called
DELETE 0
postgres=# insert into my_table select * from my_table where i = 100;
WARNING:  trigger called
INSERT 0 0

> Would the statement even execute if no rows get updated and that is prevented 
> with before update? I would assume null is being returned rather than old if 
> the trigger finds the row to be identical.

It looks like a statement trigger is always called, but checking
REFERENCING matches affected rows as returned by the psql shell. Also
notice how the number of affected rows is non-zero for trivial update
before the use of suppress_redundant_updates_trigger, both through
REFERENCING and through the psql shell.

That matches also documentation:

> ..., a trigger that is marked FOR EACH STATEMENT only executes once for any 
> given operation, regardless of how many rows it modifies (in particular, an 
> operation that modifies zero rows will still result in the execution of any 
> applicable FOR EACH STATEMENT triggers).

So it would be really cool to be able to access the number of affected
rows inside a trigger without the use of REFERENCING. Given that WHEN
condition of a statement trigger is currently mostly useless (because
the condition cannot refer to any values in the table) maybe providing
something like AFFECTED variable in there would be the way to go? So
one could write:

CREATE TRIGGER my_trigger AFTER UPDATE ON my_table FOR EACH STATEMENT
WHEN AFFECTED <> 0 EXECUTE FUNCTION my_table_func();


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mitar
Hi!

On Wed, Oct 27, 2021 at 1:16 AM Mark Dilger
 wrote:
> If Mitar finds that suppress_redundant_updates_trigger is sufficient, that 
> may be a simpler solution.  Thanks for mentioning it.
>
> The suppress_redundant_updates_trigger uses memcmp on the old and new rows.  
> I don't know if memcmp will be sufficient in this case, since json can be 
> binary unequal and yet turn out to be equal once cast to jsonb.  I was using 
> the rule and casting the json column to jsonb before comparing for equality.

Very interesting, I didn't know about that trigger. Memcmp is OK for
my use case. This is why I am considering *= as well.

I am guessing that if I am already doing a row comparison on every
UPDATE before my AFTER trigger so that I do not run the trigger (the
rule-based approach suggested by Mark), it is probably better to do
the row comparison as a BEFORE trigger which prevents the UPDATE from
even happening. I already pay for the row comparison so at least I
could prevent the disk write as well. Do I understand that correctly?

So the only remaining question is how to prevent my statement trigger
from running if no rows end up being changed by INSERT/UPDATE/DELETE
without having to use REFERENCING.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mitar
Hi!

On Tue, Oct 26, 2021 at 10:55 PM Mark Dilger
 wrote:
> The trigger "my_table_trig" in the example is a per row trigger, but it 
> exists only to demonstrate that the rule has filtered out the appropriate 
> rows.  You can use the rule "my_table_rule" as written and a per statement 
> trigger, as here:

Oh, very interesting. I thought that this is not possible because WHEN
condition on triggers does not have NEW and OLD. But this is a very
cool way to combine rules with triggers, where a rule can still
operate by row.

Thank you for sharing this!

> Note that there is a performance cost to storing the old rows using the 
> REFERENCING clause of the trigger

Yea, by moving the trivial update check to a rule, I need REFERENCING
only to see if there were any changes at all. This seems a bit
excessive. Is there a way to check if any rows have been affected by
an UPDATE inside a per statement trigger without using REFERENCING?

> Note that I used equality and inequality rather than IS DISTINCT FROM and IS 
> NOT DISTINCT FROM in the design, but you should think about how NULL values 
> (old, new, or both) will behave in the solution you choose.

I have just now tested the following rule:

CREATE RULE filter_trivial_updates AS ON UPDATE TO my_table WHERE NEW
*= OLD DO INSTEAD NOTHING;

and it looks like it works well. It sidesteps the issue around
equality operator for type json and also just compares nulls as just
another value (which I would like). Not sure how it is performance
wise in comparison with listing all columns and using the regular
equality operator.

I also notice that you check if a table has any rows with:

SELECT true INTO have_rows FROM old_values LIMIT 1;
IF have_rows THEN ...

Is this just a question of style or is this a better approach than my:

PERFORM * FROM old_values LIMIT 1;
IF FOUND THEN ...


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mitar
Hi!

On Tue, Oct 26, 2021 at 10:17 PM Mark Dilger
 wrote:
> I can't tell from your post if you want the trivial update to be performed, 
> but if not, would it work to filter trivial updates as:

No, I want to skip trivial updates (those which have not changed
anything). But my trigger is per statement, not per row. So I do not
think your approach works there? So this is why I am then making a
more complicated check inside the trigger itself.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mitar
Hi!

Thank you everyone for your responses. I investigated them.

I have also found composite type operators [1]. There is no way to
tell the EXCEPT operator to use *= as its equality operator? *EXCEPT
would seem to be a useful operator to have. :-) I am not sure about
performance though. EXCEPT is generally fast, but probably because it
can use indices, not sure how fast *= is, given that it is comparing
binary representations. What is experience with this operator of
others?


Mitar

[1] 
https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Determining if a table really changed in a trigger

2021-10-26 Thread Mitar
Hi!

I have a trigger like:

CREATE TRIGGER update_trigger AFTER UPDATE ON my_table REFERENCING NEW
TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE
FUNCTION trigger_function;

I would like to test inside trigger_function if the table really
changed. I have tried to do:

PERFORM * FROM ((TABLE old_table EXCEPT TABLE new_table) UNION ALL
(TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1;
IF FOUND THEN
  ... changed ...
END IF;

But this fails if the table contains a JSON field with the error:

could not identify an equality operator for type json

The table has an unique index column, if that helps.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




[issue24739] allow argparse.FileType to accept newline argument

2021-10-25 Thread Mitar


Mitar  added the comment:

I think the issue is that it is hard to subclass it. Ideally, call to open 
would be made through a new _open method which would then call it, and one 
could easily subclass that method if/when needed.

--
nosy: +mitar

___
Python tracker 
<https://bugs.python.org/issue24739>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[Wikidata-bugs] [Maniphest] T115223: Provide wikidata downloads as multiple files to make access more robust and efficient

2021-06-20 Thread Mitar
Mitar added a comment.


  In fact, this is not a problem, see 
https://phabricator.wikimedia.org/T222985#7164507
  
  pbzip2 is problematic and cannot decompress in parallel files not compressed 
with pbzip2. But lbzip2 can. So using lbzip2 makes decompression of single file 
dumps fast. So not sure if it would be faster to have multiple files.

TASK DETAIL
  https://phabricator.wikimedia.org/T115223

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, Mitar, abian, JanZerebecki, Hydriz, hoo, Halfak, NealMcB, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-20 Thread Mitar
Mitar added a comment.


  OK, so it seems the problem is in pbzip2. It is not able to decompress in 
parallel unless compression was made with pbzip2, too. But lbzip2 can 
decompress all of them in parallel.
  
  See:
  
$ time bunzip2 -c -k latest-lexemes.json.bz2 > /dev/null

real1m0.101s
user0m59.912s
sys 0m0.180s
$ time pbzip2 -d -k -c latest-lexemes.json.bz2 > /dev/null

real0m57.662s
user0m57.792s
sys 0m0.180s
$ time lbunzip2 -c -k latest-lexemes.json.bz2 > /dev/null

real0m13.346s
user1m35.951s
sys 0m2.342s
$ lbunzip2 -c -k latest-lexemes.json.bz2 > serial.json
$ pbzip2 -z < serial.json > parallel.json.bz2
$ time lbunzip2 -c -k parallel.json.bz2 > /dev/null

real0m16.270s
user1m43.004s
sys 0m2.262s
$ time pbzip2 -d -c -k parallel.json.bz2 > /dev/null

real0m17.324s
user1m52.946s
sys 0m0.659s
  
  Size is very similar:
  
$ ll parallel.json.bz2 latest-lexemes.json.bz2 
-rw-rw-r-- 1 mitar mitar 168657719 Jun 15 20:36 latest-lexemes.json.bz2
-rw-rw-r-- 1 mitar mitar 168840138 Jun 20 07:35 parallel.json.bz2

TASK DETAIL
  https://phabricator.wikimedia.org/T222985

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs, 
Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-20 Thread Mitar
Mitar added a comment.


  Are you saying that existing wikidata json dumps can be decompressed in 
parallel if using lbzip2, but not pbzip2?

TASK DETAIL
  https://phabricator.wikimedia.org/T222985

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs, 
Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T115223: Provide wikidata downloads as multiple files to make access more robust and efficient

2021-06-19 Thread Mitar
Mitar added a comment.


  I am realizing that maybe the problem is just that bzip2 compression is not 
multistream but singlestream. Moreover, using newer compression algorithms like 
zstd might decrease decompression speed even further, removing the need for 
multiple files altogether. See https://phabricator.wikimedia.org/T222985#7163885

TASK DETAIL
  https://phabricator.wikimedia.org/T115223

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, Mitar, abian, JanZerebecki, Hydriz, hoo, Halfak, NealMcB, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-19 Thread Mitar
Mitar added a comment.


  As a reference see also this discussion 
<https://www.wikidata.org/wiki/Wikidata_talk:Database_download#Dumps_cannot_be_decompressed_in_parallel>.
  
  I think the problem with bzip2 is that it is currently singlestream so one 
cannot really decompress it in parallel. Based on this answer 
<https://www.wikidata.org/wiki/Wikidata_talk:Database_download#Reading_the_JSON_dump_with_Python>
 it seems that this was done on purpose, but since 2016 maybe we do not have to 
worry about compatibility anymore and just change bzip2 to be multistream? For 
example, by using this tool <https://linux.die.net/man/1/pbzip2>.
  
  But from my experience (from other contexts), zstd is really good. +1 on 
providing that as well, if possible from disk space perspective.
  
  I think by supporting parallel decompression, then issue 
https://phabricator.wikimedia.org/T115223 could be addressed as well.

TASK DETAIL
  https://phabricator.wikimedia.org/T222985

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs, 
Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread Mitar
Mitar added a comment.


  Are you sure `lastrevid` works like that for the whole dump? I think that 
dump is made from multiple shards, so it might be that `lastrevid` is not 
consistent across all items?

TASK DETAIL
  https://phabricator.wikimedia.org/T209390

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Sascha, Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, 
jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: Why is writing JSONB faster than just JSON?

2021-04-23 Thread Mitar
Hi!

On Fri, Apr 23, 2021 at 10:49 AM Francisco Olarte
 wrote:
> Of course, I did not follow the thread to deeply, just pointed that in
> case you were assuming that was not going to be stored compressed.

Thanks for pointing that out. I was just trying to make sure I am
understanding you correctly and that we are all on the same page about
implications. It seems we are.

> Also, not surprised JSONB ends up being fatter,

Yes, by itself this is not surprising. Why I mentioned it is because
in my original post in this thread, I posted that I am surprised that
inserting into JSONB column seems observably faster than into JSON or
TEXT column (for same data) and I wonder why that is. One theory
presented was that JSONB might compress better so there is less IO so
insertion is faster. But JSONB does not look more compressed (I have
not measured the size in my original benchmark), so now I am searching
for other explanations for the results of my benchmark.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Why is writing JSONB faster than just JSON?

2021-04-23 Thread Mitar
Hi!

On Fri, Apr 23, 2021 at 10:28 AM Francisco Olarte
 wrote:
> A fast look at the link. It seems to be long string of random LOWER
> CASE HEX DIGITS. A simple huffman coder can probably put it in 5 bits
> per char, and a more sophisticated algorithm can probably approach 4.

But this compression-ility would apply to both JSONB and JSON column
types, no? Moreover, it looks like JSONB column type ends up larger on
disk.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Why is writing JSONB faster than just JSON?

2021-04-23 Thread Mitar
Hi!

On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > My point was that for JSON, after validating that the input is
> > syntactically correct, we just store it as-received.  So in particular
> > the amount of whitespace in the value would depend on how the client
> > had chosen to format the JSON.  This'd affect the stored size of
> > course, and I think it would have an effect on compression time too.
>
> Yes, I got it and just wanted to confirm you were right - this was the
> reason I've observed slowdown trying to reproduce the report.

Thank you for trying to reproduce the report. I did a bit more digging
myself and I am still confused.

First, it is important to note that the JSON I am using contains
primarily random strings as values, so not really something which is
easy to compress. See example at [1]. I have realized though that in
the previous benchmark I have been using the same JSON document and
inserting it multiple times, so compression might work across
documents or something. So I ran a version of the benchmark with
different JSONs being inserted (but with the same structure, just
values are random strings). There was no difference.

Second, as you see from [1], the JSON representation I am using is
really compact and has no extra spaces. I also used
pg_total_relation_size to get the size of the table after inserting
10k rows and the numbers are similar, with JSONB being slightly larger
than others. So I think the idea of compression does not hold.

So I do not know what is happening and why you cannot reproduce it.
Maybe explain a bit how you are trying to reproduce it? Directly from
psql console? Are you using the same version as me (13.2)?

Numbers with inserting the same large JSON 10k times:

Type: jsonb
Mean: 200243.1
Stddev: 1679.7741187433503
Size: { pg_total_relation_size: '4611792896' }
Type: json
Mean: 256938.5
Stddev: 2471.9909890612466
Size: { pg_total_relation_size: '4597833728' }
Type: text
Mean: 248175.3
Stddev: 376.677594236769
Size: { pg_total_relation_size: '4597833728' }

Inserting different JSON 10k times:

Type: jsonb
Mean: 202794.5
Stddev: 978.5346442512907
Size: { pg_total_relation_size: '4611792896' }
Type: json
Mean: 259437.9
Stddev: 1785.8411155531167
Size: { pg_total_relation_size: '4597833728' }
Type: text
Mean: 250060.5
Stddev: 912.9207249263213
Size: { pg_total_relation_size: '4597833728' }

[1] https://gitlab.com/mitar/benchmark-pg-json/-/blob/master/example.json


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Why is writing JSONB faster than just JSON?

2021-04-14 Thread Mitar
Hi!

I have a project where we among other data want to store static JSON
objects which can get pretty large (10-100 KB). I was trying to
evaluate how it would work if we simply store it as an additional
column in a PostgreSQL database. So I made a benchmark [1]. The
results surprised me a bit and I am writing here because I would like
to understand them. Namely, it looks like writing into a jsonb typed
column is 30% faster than writing into a json typed column. Why is
that? Does not jsonb require parsing of JSON and conversion? That
should be slower than just storing a blob as-is?

[1] https://gitlab.com/mitar/benchmark-pg-json


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




[Wikidata-bugs] [Maniphest] T115223: Provide wikidata downloads as multiple files to make access more robust and efficient

2021-04-03 Thread Mitar
Mitar added a comment.


  Thank you for redirecting me to this issue. As I mentioned in T278204 
<https://phabricator.wikimedia.org/T278204> my main motivation is in fact not 
downloading in parallel, but processing in parallel. Just decompressing that 
large file takes half a day on my machine. If I can instead use 12 machines on 
12 splits, for example, I can do that decompression (or some other processing) 
in one hour instead.

TASK DETAIL
  https://phabricator.wikimedia.org/T115223

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, Mitar, abian, JanZerebecki, Hydriz, hoo, Halfak, NealMcB, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-23 Thread Mitar
Mitar added a comment.


  I realized I have exactly the same need as poster on StackOveflow: get a dump 
and then using real-time feed to keep it updated. But you have to know where to 
start with the real-time feed through EventStreams, using historical 
consumption 
<https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Historical_Consumption>
 to resume from the point the dump wasmade.

TASK DETAIL
  https://phabricator.wikimedia.org/T209390

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, jannee_e, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T278204: Provide Wikidata dumps as multiple files

2021-03-23 Thread Mitar
Mitar updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T278204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, hoo, Mitar, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, 
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T278204: Provide Wikidata dumps as multiple files

2021-03-23 Thread Mitar
Mitar updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T278204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Addshore, hoo, Mitar, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, 
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T278204: Provide Wikidata dumps as multiple files

2021-03-22 Thread Mitar
Mitar created this task.
Mitar added projects: Wikidata, Dumps-Generation.
Restricted Application added a project: wdwb-tech.

TASK DESCRIPTION
  My understanding is that dumps are currently in fact already produced by 
multiple shards and then combined into one file. I wonder why simply multiple 
files are not kept because that would also make it easier to process dumps in 
parallel over multiple files. There are already no guarantees on the order of 
documents in dumps. Currently this is hard because it is hard to split a 
compressed file into multiple chunks without decompressing the file first (and 
then potentially recompressing chunks back). So, given that dump size has grown 
through time, maybe it is time that it is provided in multiple files, each file 
at some reasonable maximum size?

TASK DETAIL
  https://phabricator.wikimedia.org/T278204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps

2021-03-21 Thread Mitar
Mitar added a comment.


  I see that API does return the `modified` field: 
https://www.wikidata.org/w/api.php?action=wbgetentities=json=Q1

TASK DETAIL
  https://phabricator.wikimedia.org/T278031

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, Aklapper, Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-21 Thread Mitar
Mitar added a comment.


  Personally, I would love to have for each item in the dump a timestamp when 
it was created and a timestamp when it was last modified.
  
  Related: https://phabricator.wikimedia.org/T278031

TASK DETAIL
  https://phabricator.wikimedia.org/T209390

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, jannee_e, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-21 Thread Mitar
Restricted Application added a project: wdwb-tech.

TASK DETAIL
  https://phabricator.wikimedia.org/T209390

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, jannee_e, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Bug 1551776] Re: Calendar: high cpu usage while open

2020-12-18 Thread Mitar
This is still present for me in Ubuntu 20.04, so I do not think it is
resolved.

Moreover, through time memory usage of the app grows substantially. I
suspect there is a memory leak.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1551776

Title:
  Calendar: high cpu usage while open

To manage notifications about this bug go to:
https://bugs.launchpad.net/gnome-calendar/+bug/1551776/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1551776] Re: Calendar: high cpu usage while open

2020-12-18 Thread Mitar
This is still present for me in Ubuntu 20.04, so I do not think it is
resolved.

Moreover, through time memory usage of the app grows substantially. I
suspect there is a memory leak.

-- 
You received this bug notification because you are a member of Ubuntu
Desktop Bugs, which is subscribed to gnome-calendar in Ubuntu.
https://bugs.launchpad.net/bugs/1551776

Title:
  Calendar: high cpu usage while open

To manage notifications about this bug go to:
https://bugs.launchpad.net/gnome-calendar/+bug/1551776/+subscriptions

-- 
desktop-bugs mailing list
desktop-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/desktop-bugs

[Desktop-packages] [Bug 1551776] Re: Calendar: high cpu usage while open

2020-12-18 Thread Mitar
This is still present for me in Ubuntu 20.04, so I do not think it is
resolved.

Moreover, through time memory usage of the app grows substantially. I
suspect there is a memory leak.

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to gnome-calendar in Ubuntu.
https://bugs.launchpad.net/bugs/1551776

Title:
  Calendar: high cpu usage while open

Status in GNOME Calendar:
  Fix Released
Status in gnome-calendar package in Ubuntu:
  Triaged

Bug description:
  STEPS:
  1. Fresh flash of 16.04 daily
  2. Open calendar
  3. Open top
  4. Note the 3 apps at the top of the hit list

  EXPECTED:
  I expect to see a small spike in cpu usage as a app open and then it to lower 
to a reasonable level

  ACTUAL:
  Calendar uses 22%
  X 41%
  Compiz 35%

  This cripples even a relatively performent machine

  See screenshots

To manage notifications about this bug go to:
https://bugs.launchpad.net/gnome-calendar/+bug/1551776/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 1770886] Re: gnome-calendar runs in background and uses a lot of memory

2020-12-18 Thread Mitar
This is still a problem in Ubuntu 20.04.

I am also noticing high CPU usage and UI often triggers "this app is
frozen, kill it?" message.

-- 
You received this bug notification because you are a member of Ubuntu
Desktop Bugs, which is subscribed to gnome-calendar in Ubuntu.
https://bugs.launchpad.net/bugs/1770886

Title:
  gnome-calendar runs in background and uses a lot of memory

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gnome-calendar/+bug/1770886/+subscriptions

-- 
desktop-bugs mailing list
desktop-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/desktop-bugs

[Desktop-packages] [Bug 1770886] Re: gnome-calendar runs in background and uses a lot of memory

2020-12-18 Thread Mitar
This is still a problem in Ubuntu 20.04.

I am also noticing high CPU usage and UI often triggers "this app is
frozen, kill it?" message.

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to gnome-calendar in Ubuntu.
https://bugs.launchpad.net/bugs/1770886

Title:
  gnome-calendar runs in background and uses a lot of memory

Status in gnome-calendar package in Ubuntu:
  Confirmed

Bug description:
  Some minutes after I logged in I realized that gnome-calendar runs in
  the background and uses more than 400MB memory. I don't launched it.
  Is it normal?

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: gnome-calendar 3.28.1-1ubuntu2
  ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
  Uname: Linux 4.15.0-20-generic x86_64
  NonfreeKernelModules: nvidia
  ApportVersion: 2.20.9-0ubuntu7
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Sat May 12 20:39:27 2018
  InstallationDate: Installed on 2017-10-20 (204 days ago)
  InstallationMedia: Ubuntu 16.04.2 LTS "Xenial Xerus" - Release amd64 
(20170215.2)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=hu_HU.UTF-8
   SHELL=/bin/bash
  SourcePackage: gnome-calendar
  UpgradeStatus: Upgraded to bionic on 2018-04-15 (27 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gnome-calendar/+bug/1770886/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 1770886] Re: gnome-calendar runs in background and uses a lot of memory

2020-12-18 Thread Mitar
This is still a problem in Ubuntu 20.04.

I am also noticing high CPU usage and UI often triggers "this app is
frozen, kill it?" message.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1770886

Title:
  gnome-calendar runs in background and uses a lot of memory

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gnome-calendar/+bug/1770886/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: Persist MVCC forever - retain history

2020-07-04 Thread Mitar
Hi!

On Fri, Jul 3, 2020 at 12:29 AM Konstantin Knizhnik
 wrote:
> Did you read this thread:
> https://www.postgresql.org/message-id/flat/78aadf6b-86d4-21b9-9c2a-51f1efb8a499%40postgrespro.ru
> I have proposed a patch for supporting time travel (AS OF) queries.
> But I didn't fill a big interest to it from community.

Oh, you went much further than me in this thinking. Awesome!

I am surprised that you are saying you didn't feel big interest. My
reading of the thread is the opposite, that there was quite some
interest, but that there are technical challenges to overcome. So you
gave up on that work?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 7:51 PM Mark Dilger  wrote:
> I expect these issues to be less than half what you would need to resolve, 
> though much of the rest of it is less clear to me.

Thank you for this insightful input. I will think it over.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:16 PM Thomas Munro  wrote:
> This was a research topic in ancient times (somewhere I read that in
> some ancient version, VACUUM didn't originally remove tuples, it moved
> them to permanent write-only storage).  Even after the open source
> project began, there was a "time travel" feature, but it was removed
> in 6.2:

Very interesting. Thanks for sharing.

> There aren't indexes on those things.

Oh. My information is based on what I read in [1]. This is where I
realized that if PostgreSQL maintains those extra columns and indices,
then there is no point in replicating that by copying all that to
another table. So this is not true? Or not true anymore?

> If you want to keep track of all changes in a way that lets you query
> things as of historical times, including joins, and possibly including
> multiple time dimensions ("on the 2nd of Feb, what address did we
> think Fred lived at on the 1st of Jan?") you might want to read
> "Developing Time-Oriented Database Applications in SQL" about this,

Interesting. I checked it out a bit. I think this is not exactly what
I am searching for. My main motivation is reactive web applications,
where I can push changes of (sub)state of the database to the web app,
when that (sub)state changes. And if the web app is offline for some
time, that it can come and resync also all missed changes. Moreover,
changes themselves are important (not just the last state) because it
allows one to merge with a potentially changed local state in the web
app while it was offline. So in a way it is logical replication and
replay, but just at database - client level.

[1] https://eng.uber.com/postgres-to-mysql-migration/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:12 PM David G. Johnston
 wrote:
> Even for a single table how would you go about specifying this in a 
> user-friendly way?  Then consider joins.

One general answer: you use query rewriting. But what is user-friendly
depends on the use case. For me, the main motivation for this is that
I would like to sync database and client state, including all
revisions of data. So it is pretty easy to then query based on this
row revision for which rows are newer and sync them over. And then I
can show diffs of changes through time for that particular row.

I agree that reconstructing joins at one particular moment in time in
the past requires more information. But that information also other
solutions (like copying all changes to a separate table in triggers)
require: adding timestamp column and so on. So I can just have a
timestamp column in my original (and only) table and have a BEFORE
trigger which populates it with a timestamp. Then at a later time,
when I have in one table all revisions of a row, I can also query
based on timestamp, but PostgreSQL revision column help me to address
the issue of two changes happening at the same timestamp.

I still gain that a) I do not have to copy rows to another table b) I
do not have to vacuum. The only downside is that I have to rewrite
queries for the latest state to operate only on the latest state (or
maybe PostgreSQL could continue to do this for me like now, just allow
me to also access old versions).

>  If by “this” you mean leveraging MVCC you don’t; it isn’t suitable for 
> persistent temporal data.

Why not?

> The fundamental missing piece is that there is no concept of timestamp in 
> MVCC.

That can be added using BEFORE trigger.

> Plus, wrap-around and freezing aren’t just nice-to-have features.

Oh, I forgot about that. ctid is still just 32 bits? So then for such
table with permanent MVCC this would have to be increased, to like 64
bits or something. Then one would not have to do wrap-around
protection, no?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

(Sorry if this was already discussed, it looks pretty obvious, but I
could not find anything.)

I was thinking and reading about how to design the schema to keep
records of all changes which happen to the table, at row granularity,
when I realized that all this is already done for me by PostgreSQL
MVCC. All rows (tuples) are already stored, with an internal version
field as well.

So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
so that all tuples persist forever, and how could I make those
internal columns visible so that I could make queries asking for
results at the particular historical version of table state? My
understanding is that indices are already indexing over those internal
columns as well, so those queries over historical versions would be
efficient as well. Am I missing something which would make this not
possible?

Is this something I would have to run a custom version of PostgreSQL
or is this possible through an extension of sort?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




[issue22848] Subparser help does not respect SUPPRESS argument

2020-05-02 Thread Mitar


Change by Mitar :


--
nosy: +mitar

___
Python tracker 
<https://bugs.python.org/issue22848>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[Desktop-packages] [Bug 1813131] Re: i965_drv_video.so doesn't load any more if a Wayland server is present [failed to resolve wl_drm_interface(): /lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined sym

2020-05-01 Thread Mitar
I can confirm this is not working on Bionic. vainfo output:

libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_1
error: failed to resolve wl_drm_interface(): 
/usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1813131

Title:
  i965_drv_video.so doesn't load any more if a Wayland server is present
  [failed to resolve wl_drm_interface(): /lib/x86_64-linux-
  gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface]

Status in Libva:
  Fix Released
Status in intel-vaapi-driver package in Ubuntu:
  Fix Released
Status in libva package in Ubuntu:
  Fix Released
Status in mesa package in Ubuntu:
  Invalid
Status in intel-vaapi-driver source package in Bionic:
  Confirmed
Status in libva source package in Bionic:
  Confirmed
Status in mesa source package in Bionic:
  Invalid

Bug description:
  If a Wayland server is present (anywhere on the system including even
  the gdm3 login screen) then i965_drv_video.so fails to initialize:

  $ vainfo 
  libva info: VA-API version 1.3.0
  libva info: va_getDriverName() returns 0
  libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
  libva info: Found init function __vaDriverInit_1_2
  error: failed to resolve wl_drm_interface(): 
/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
  libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
  libva info: va_openDriver() returns -1
  vaInitialize failed with error code -1 (unknown libva error),exit

  $ mpv bbb_sunflower_2160p_60fps_normal.mp4 
  Playing: bbb_sunflower_2160p_60fps_normal.mp4
   (+) Video --vid=1 (*) (h264 3840x2160 60.000fps)
   (+) Audio --aid=1 (*) (mp3 2ch 48000Hz)
   Audio --aid=2 (*) (ac3 6ch 48000Hz)
  File tags:
   Artist: Blender Foundation 2008, Janus Bager Kristensen 2013
   Comment: Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
   Composer: Sacha Goedegebure
   Genre: Animation
   Title: Big Buck Bunny, Sunflower version
  error: failed to resolve wl_drm_interface(): 
/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
  [vaapi] libva: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed

  Meanwhile, it continues to work after you have logged into a Xorg
  session.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.04
  Package: i965-va-driver 2.2.0-0ubuntu1
  ProcVersionSignature: Ubuntu 4.18.0-11.12-generic 4.18.12
  Uname: Linux 4.18.0-11-generic x86_64
  ApportVersion: 2.20.10-0ubuntu19
  Architecture: amd64
  Date: Thu Jan 24 16:54:21 2019
  InstallationDate: Installed on 2018-12-04 (51 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Alpha amd64 (20181203)
  SourcePackage: intel-vaapi-driver
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/libva/+bug/1813131/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1813131] Re: i965_drv_video.so doesn't load any more if a Wayland server is present [failed to resolve wl_drm_interface(): /lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbo

2020-05-01 Thread Mitar
I can confirm this is not working on Bionic. vainfo output:

libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_1
error: failed to resolve wl_drm_interface(): 
/usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libva in Ubuntu.
https://bugs.launchpad.net/bugs/1813131

Title:
  i965_drv_video.so doesn't load any more if a Wayland server is present
  [failed to resolve wl_drm_interface(): /lib/x86_64-linux-
  gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface]

Status in Libva:
  Fix Released
Status in intel-vaapi-driver package in Ubuntu:
  Fix Released
Status in libva package in Ubuntu:
  Fix Released
Status in mesa package in Ubuntu:
  Invalid
Status in intel-vaapi-driver source package in Bionic:
  Confirmed
Status in libva source package in Bionic:
  Confirmed
Status in mesa source package in Bionic:
  Invalid

Bug description:
  If a Wayland server is present (anywhere on the system including even
  the gdm3 login screen) then i965_drv_video.so fails to initialize:

  $ vainfo 
  libva info: VA-API version 1.3.0
  libva info: va_getDriverName() returns 0
  libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
  libva info: Found init function __vaDriverInit_1_2
  error: failed to resolve wl_drm_interface(): 
/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
  libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
  libva info: va_openDriver() returns -1
  vaInitialize failed with error code -1 (unknown libva error),exit

  $ mpv bbb_sunflower_2160p_60fps_normal.mp4 
  Playing: bbb_sunflower_2160p_60fps_normal.mp4
   (+) Video --vid=1 (*) (h264 3840x2160 60.000fps)
   (+) Audio --aid=1 (*) (mp3 2ch 48000Hz)
   Audio --aid=2 (*) (ac3 6ch 48000Hz)
  File tags:
   Artist: Blender Foundation 2008, Janus Bager Kristensen 2013
   Comment: Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
   Composer: Sacha Goedegebure
   Genre: Animation
   Title: Big Buck Bunny, Sunflower version
  error: failed to resolve wl_drm_interface(): 
/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
  [vaapi] libva: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed

  Meanwhile, it continues to work after you have logged into a Xorg
  session.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.04
  Package: i965-va-driver 2.2.0-0ubuntu1
  ProcVersionSignature: Ubuntu 4.18.0-11.12-generic 4.18.12
  Uname: Linux 4.18.0-11-generic x86_64
  ApportVersion: 2.20.10-0ubuntu19
  Architecture: amd64
  Date: Thu Jan 24 16:54:21 2019
  InstallationDate: Installed on 2018-12-04 (51 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Alpha amd64 (20181203)
  SourcePackage: intel-vaapi-driver
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/libva/+bug/1813131/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 1813131] Re: i965_drv_video.so doesn't load any more if a Wayland server is present [failed to resolve wl_drm_interface(): /lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol

2020-05-01 Thread Mitar
I can confirm this is not working on Bionic. vainfo output:

libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_1
error: failed to resolve wl_drm_interface(): 
/usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1813131

Title:
  i965_drv_video.so doesn't load any more if a Wayland server is present
  [failed to resolve wl_drm_interface(): /lib/x86_64-linux-
  gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface]

To manage notifications about this bug go to:
https://bugs.launchpad.net/libva/+bug/1813131/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Bug 1813131] Re: i965_drv_video.so doesn't load any more if a Wayland server is present [failed to resolve wl_drm_interface(): /lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interfa

2020-05-01 Thread Mitar
I can confirm this is not working on Bionic. vainfo output:

libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_1
error: failed to resolve wl_drm_interface(): 
/usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1813131

Title:
  i965_drv_video.so doesn't load any more if a Wayland server is present
  [failed to resolve wl_drm_interface(): /lib/x86_64-linux-
  gnu/libEGL_mesa.so.0: undefined symbol: wl_drm_interface]

To manage notifications about this bug go to:
https://bugs.launchpad.net/libva/+bug/1813131/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [tor-relays] Issues reaching gigabit relay speeds

2019-10-31 Thread Mitar
Hi!

On Thu, Oct 31, 2019 at 6:21 AM Matt Traudt  wrote:
> - In an ideal world you won't get more load than your fair share.
> Consider a hypothetically large Tor network with loads of high-capacity
> relays. Every relay may be capable of 1 Gbps but only see 10 Mbps, yet
> there is absolutely no problem.

Thank you. Yes, I understand that if there is more capacity then the
load will be not fully saturate the available capacity. So it might be
simply that there are so much relays but not enough exits.

Then my question is different: how could I test and assure that my
nodes are able to utilize full gigabit if such demand would be
required? So that I can assure that they are ready and available? And
that there is not some other bottleneck somewhere on nodes themselves?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: Automatically parsing in-line composite types

2019-10-30 Thread Mitar
Hi!

On Wed, Oct 30, 2019 at 3:06 PM Merlin Moncure  wrote:
> It looks it up from the database.

Yes, this is how I started doing it in my prototype as well.

> Correct. Only declared (via CREATE TYPE) composite types will work due
> to protocol limitations.

Exactly. This is where I got stuck, so this is why I started this thread. :-(

> So if you decided to scratch in itch and create a postgres
> BSON type, no one would likely use it, since the chances of adoption
> in core are slim to none.

Yea. :-( So we get back to square one. :-(

One other approach I was investigating was developing a Babel-like
transpiler for PostgreSQL SQL, so that I could have plugins which
would convert SQL queries to automatically encode values in JSON. And
then parse it back out once results arrive. Because yes, as you note,
JSON is the only stable and supported format among all installations
there is (except for the wire format, which has limitations). So
having to map to it and back, but without developer having to think
about it, might be the best solution.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Automatically parsing in-line composite types

2019-10-30 Thread Mitar
Hi!

On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure  wrote:
> Check out libpqtypes: https://github.com/pgagarinov/libpqtypes

Interesting. I have looked at the code a bit but I do not find how it
determines the type for inline compound types, like the ones they
appear in my original SQL query example. Could you maybe point me to
the piece of code there handling that? Because to my
understanding/exploration that information is simply not exposed to
the client in any way. :-(

> it does exactly what you want. It's a wrapper for libpq that provides
> client side parsing for the binary protocol with array and composite
> type parsing.

It looks to me that it does parsing of composite types only if they
are registered composite types. But not for example ones you get if
you project a subset of fields from a table in a subquery. That has no
registered composite type?

Also, how you are handling discovery of registered types, do you read
that on-demand from the database? They are not provided over the wire?

> Virtually any
> non-C client application really ought to be using json rather than the
> custom binary structures libpqtyps would provide.

I thought that initially, too, but then found out that JSON has some
heavy limitations because the implementation in PostgreSQL is standard
based. There is also no hook to do custom encoding of non-JSON values.
So binary blobs are converted in an ugly way (base64 would be better).
You also loose a lot of meta-information, because everything non-JSON
gets converted to strings automatically. Like knowing what is a date.
I think MongoDB with BSON made much more sense here. It looks like
perfect balance between simplicity of JSON structure and adding few
more useful data types.

But yes, JSON is great also because clients often have optimized JSON
readers. Which can beat any other binary serialization format. In
node.js, it is simply the fastest there is to transfer data:

https://mitar.tnode.com/post/in-nodejs-always-query-in-json-from-postgresql/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Automatically parsing in-line composite types

2019-10-29 Thread Mitar
Hi!

On Tue, Oct 29, 2019 at 11:33 AM Tom Lane  wrote:
> [ shrug... ]  In a world where stability of the wire protocol were
> of zero value, maybe we would do that.  In the real world, don't
> hold your breath.

Oh, yes. I would hope this would be possible in backwards compatible
way. I am not too familiar with the wire protocol to know the answer
to that though.

> Clients would also
> have to be prepared to parse and de-escape the data representation,
> which is not trivial in either text or binary cases.

Yes, but currently they cannot be prepared. They simply lack necessary
information. So if they are not prepared, then the state is the same
as it is currently: they get some composite type in its encoded
representation as a value. But if they are prepared, they have
necessary metadata to parse it.

> On the whole I think it's generally better practice to explode your
> composite types into separate fields for transmission to the client.

The issue here is that it is really hard to make a general client for
PostgreSQL. User might want to an arbitrary SQL query. I would like to
be able to parse that automatically, without user having to specify
additional how to parse it, or requiring them to change SQL query, or
showing them encoded representation directly (not very user friendly).

I agree that in simple cases one could just change the SQL query, but
that is not really always possible. For example, aggregating into an
array a related table is very useful because it makes amount of data
transmitted over the wire much smaller (instead of having to repeat
again and again contents of rows of main table).

> Note that the cases where JSON or XML shine are where you don't
> necessarily have a consistent set of fields in different instances
> of the composite values.  Even if we did extend RowDescription to
> support describing composites' sub-fields, it wouldn't be in
> much of a position to deal with that.

Yes, but that case is already handled: you just have a column type
"JSON' (or "JSONB") and it is clear how to automatically parse that.
What I am missing is a way to automatically parse composite types.
Those are generally not completely arbitrary, but are defined by the
query, not by data.

What would be the next step to move this further in some direction?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Automatically parsing in-line composite types

2019-10-29 Thread Mitar
Hi!

On Tue, Oct 29, 2019 at 9:06 AM Fabio Ugo Venchiarutti
 wrote:
> You can use subqueries and array_agg() to deepen your output tree all
> the way to a stack overflow, a single _to_json() call at the
> top will recursively traverse and convert whatever you feed it.

Yes, what you are describing is exactly the sad state of things: the
only way to meaningfully retrieve inline composite types which are
made when one aggregate things like that, or when you subselect a set
of fields from a table in a sub-query, is that you then convert the
whole thing to JSON and transmit it in that way. Because this is the
only way you can parse things on the client. Because if you leave it
as raw composite type encoding, you cannot really parse that on the
client correctly in all cases without knowing what types are stored
inside those composite types you are getting.

But JSON is not a lossless transport format: it does not support full
floating point spec (no inf, NANs) and for many types of fields it
just converts to string representation of that, which can be
problematic. For example, if you have binary blobs.

So no, JSON is a workaround, but it is sad that we should have to use
it. PostgreSQL seems to be almost there with the support for composite
types and nested query results, only it seems you cannot really parse
it out. I mean, why PostgreSQL even has its own binary format for
results, then it could just transmit everything as JSON. :-) But that
does not really work for many data types.

I think RowDescription should be extended to provide full recursive
metadata about all data types. That would be the best way to do it.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Automatically parsing in-line composite types

2019-10-29 Thread Mitar
Hi!

On Tue, Oct 29, 2019 at 5:23 AM Dave Cramer  wrote:
> Reading the RowDescription is the only way I am aware of.

But that provides only the types for the top-level fields. Not the
inline composite types. If your top-level field is a registered
composite type then yes, it works out if you then go and read from
system tables definitions of those types. But for any other case where
you for example subselect a list of columns from a table in a
sub-query, it does not work out.

I think ideally, with introduction of composite types into PostgreSQL,
RowDescription should have been extended to provide information for
composite types as well, recursively. In that way you would not even
have to go and fetch additional information from other types,
potentially hitting race conditions.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Automatically parsing in-line composite types

2019-10-23 Thread Mitar
Hi!

Bump my previous question. I find it surprising that it seems this
information is not possible to be reconstructed by the client, when
the server has to have it internally. Is this a new feature request or
am I missing something?

> I am trying to understand how could I automatically parse an in-line
> composite type. By in-line composite type I mean a type corresponding
> to ROW. For example, in the following query:
>
> SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body))
> FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts
>
> It looks like I can figure out that "comments" is an array of records.
> But then there is no way really to understand how to parse those
> records? So what are types of fields in the record?
>
> I start the parsing process by looking at types returned in
> RowDescription message and then reading descriptions in pg_type table.
>
> Is there some other way to get full typing information of the result I
> am assuming is available to PostreSQL internally?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
""Hi!

On Thu, Oct 17, 2019 at 5:38 PM Jens Alfke  wrote:
> Why should SQLite make changes, which would introduce performance problems if 
> used, just to save your particular application the trouble of concatenating 
> some vectors into single columns, when it uses SQLite for an edge use-case 
> that’s pretty far removed from its main purpose?

Then maybe section "File archive and/or data container" in
"Appropriate Uses For SQLite" should explain that this is not the
purpose of SQLite anymore. Because "SQLite is a good solution for any
situation that requires bundling diverse content into a self-contained
and self-describing package for shipment across a network." seem to
work only when "diverse" is a table with less 2000 columns. Somehow
describing a table with key/value columns can hardly be called
self-describing.

I am on purpose ironic, because I am not sure if talking about "main
purpose" is really a constructive conversation here if there is a list
of many "non-main" but still suggested use cases for SQLite. Not to
mention the "Data analysis" use case, where again, if I am used to do
analysis on datasets with many columns now would have to change the
algorithms how I do my analysis to adapt to limited number of columns.
It does not seem that putting vectors into single columns would really
enable many "Data analysis" options inside SQLite. I am even surprised
that it says "Many bioinformatics researchers use SQLite in this way."
With limit on 2000 columns this is a very strange claim. I would love
to see a reference here and see how they do that. I might learn
something new.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

Oh, or we could just split CSV into separate lines, and then just
store one line per SQLite row, into one column. Then we do not have to
use JSON or something.

That would work for CSV files. For other types of inputs we might be
able to find some other similar approach.

So generally the main use case would be sub-sampling rows.


Mitar

On Thu, Oct 17, 2019 at 4:11 PM Donald Griggs  wrote:
>
> So if character-separated values (CSV-ish) were originally your preferred
> import format, would using that format for the blob's work for you?
>
> E.g., Suppose you need to index the first two fields only, and so can use a
> blob column for the bulk of the record.  If the records were supplied as:
>  MyFirstField~MySecondField~thousands|of|data|items|...
> and you imported these records into your 3-column table, defining tilde (~)
> as the separator, you could retain a simple format.
>
> If ever needed, you can easily export the table with the indexed columns
> treated like the bulk of the data
> from the sqlite3 command line utility:
> .mode list
> .separator |
> SELECT * FROM MyTable;
>
>MyFirstField|MySecondField|thousands|of|data|items|...
>
>
>
>
>
>
>
> On Thu, Oct 17, 2019 at 9:10 AM Mitar  wrote:
>
> > Hi!
> >
> > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > > my suggestion would be to store them as JSON in a blob, and use the JSON
> > > functions of SQLite to extract the data
> >
> > JSON has some crazy limitations like by standard it does not support
> > full floating point spec, so NaN and infinity cannot be represented
> > there. So JSON is really no a great format when you want to preserve
> > as much of the input as possible (like, integers, floats, text, and
> > binary). SQLite seems to be spot on in this regard.
> >
> > But yes, if there would be some other standard to SQLite and supported
> > format to embed, that approach would be useful. Like composite value
> > types.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
 Hi!

Thanks for this input. So you are saying that sqlite3_column 100k
times per row is slow, but retrieving 100k rows to construct one
"original" row will be faster? So not sure if I understand why reading
and decoding cells in over multiple columns is so much slower than
reading and decoding cells in over multiple rows?

Mitar

On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter  wrote:
>
> I have the impression that you still do not grasp the folly of a 100k column 
> schema.
>
> See the example below, which only has 6 fields. As you can see, each field 
> requires a Column opcode and arguments (about 10 bytes) and a "register" to 
> hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to 
> retrieve a row from the database. It ill also involve SQLite decoding 100k 
> field values and your application calling sqlite3_column interface 100k times 
> for each and every row, which yield an expected performance of about 2 rows 
> per second. Can you afford to use that much memory and time?
>
> asql> create temp table genes (id integer primary key, name char, f1 char, f2 
> char, f3 char, f4 char);
> asql> .explain
> asql> explain select * from genes;
> addr  opcode p1p2p3p4 p5  comment
>   -        -  --  -
> 0 Init   0 13000  Start at 13
> 1 OpenRead   0 2 1 6  00  root=2 iDb=1; genes
> 2 Explain2 0 0 SCAN TABLE genes  00
> 3 Rewind 0 12000
> 4   Rowid  0 1 000  r[1]=rowid
> 5   Column 0 1 200  r[2]=genes.name
> 6   Column 0 2 300  r[3]=genes.f1
> 7   Column 0 3 400  r[4]=genes.f2
> 8   Column 0 4 500  r[5]=genes.f3
> 9   Column 0 5 600  r[6]=genes.f4
> 10  ResultRow  1 6 000  output=r[1..6]
> 11Next   0 4 001
> 12Halt   0 0 000
> 13Transaction1 0 1 0  01  usesStmtJournal=0
> 14Goto   0 1 000
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 15:11
> An: SQLite mailing list 
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > my suggestion would be to store them as JSON in a blob, and use the
> > JSON functions of SQLite to extract the data
>
> JSON has some crazy limitations like by standard it does not support full 
> floating point spec, so NaN and infinity cannot be represented there. So JSON 
> is really no a great format when you want to preserve as much of the input as 
> possible (like, integers, floats, text, and binary). SQLite seems to be spot 
> on in this regard.
>
> But yes, if there would be some other standard to SQLite and supported format 
> to embed, that approach would be useful. Like composite value types.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | 
> Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) 
> +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> my suggestion would be to store them as JSON in a blob, and use the JSON
> functions of SQLite to extract the data

JSON has some crazy limitations like by standard it does not support
full floating point spec, so NaN and infinity cannot be represented
there. So JSON is really no a great format when you want to preserve
as much of the input as possible (like, integers, floats, text, and
binary). SQLite seems to be spot on in this regard.

But yes, if there would be some other standard to SQLite and supported
format to embed, that approach would be useful. Like composite value
types.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

This is getting a bit off topic.

On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin  wrote:
> 1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel 
> maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data 
> directly from a file, but it has a max array size of 1.  R maxes out at 
> 2147483647, which is more than 2 billion.  But R has to hold all the data 
> from a matrix in memory at once and it can't assign enough memory to one 
> object to hold that many cells.

Of course, 2 billion is a lot. But 100k is something many ML libraries
support. Pandas, ndarray, R. Not something to magical about that.

> 2) Object names are not data.  They're descriptions in your favourite human 
> language.  They're not meant to have weird sequences of characters in.

Not sure what this relates to.

> 3) Lots of CSV import filters ignore a column header row, or can only create 
> fieldnames with certain limits (max length, no punctuation characters, etc.). 
>  So you should expect to lose fieldnames if you try to import your data into 
> some new piece of software.

Does SQLite have limitations on what can be a column name? If not,
then I would not worry what some CSV importers do. We would use a good
one to convert to SQLLite.

> (4) SQLite stores all the data for a row is together, in a sequence.  If you 
> ask for the data in the 3756th column of a row, SQLite has to read and parse 
> the data for the first 3755 columns of that row, just to read a single value 
> from storage.  As you can imagine, this is slow and involves a lot of I/O.  
> And while it happens the row up to that point must all be held in memory.  
> Consequently, nobody who uses SQLite for its intended purpose actually does 
> this.  I dread to think how slow random access over 2 billion columns would 
> be in SQLite.

I wrote earlier that for us use case where we are reading whole rows
is the most common one.

> Your gene expressions are data.  They are not the names of table entities.  
> They should be stored in a table as other posts suggested.

Maybe. But often this data is represented as a row of expressions with
columns for each gene. Because this is what is being distributed, we
are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too
much. I do hear suggestions to do such transformation, but that is
less ideal for our use case.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

In that case we would have to define a standard BLOB storage format,
slightly defeating the idea of using SQLite to define such standard
future-proof format. :-)


Mitar

On Thu, Oct 17, 2019 at 11:19 AM Hick Gunter  wrote:
>
> Since your data is at least mostly opaque in the sense that SQLite is not 
> expected to interpret the contents, why not split your data into "stuff you 
> want to query ins SQLite" and "stuff you want to just store"? The former 
> means individual columns, whereas the latter could be stored in a single BLOB 
> field, which only your application knows how to extract data from.
>
> This allows SQLite to efficiently process the fields it needs to know about, 
> and return BLOB data efficiently as one single field instead of having to 
> pick it apart into 100k bits.
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 10:56
> An: Richard Hipp 
> Cc: SQLite mailing list 
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> I can see how this is a reasonable limit when SQLite is used for querying 
> power it provides. In our case we are really focusing on it as a standard 
> long-term storage format. So in the "Appropriate Uses For SQLite" document 
> [1] you have a section called "File archive and/or data container" and this 
> is why we started considering SQLite as a dataset archive format. We would 
> not like to store files directly, but contents of those files (like contents 
> of CSV). But try to not modify them more than necessary. So we got interested 
> especially in the "SQLite is a good solution for any situation that requires 
> bundling diverse content into a self-contained and self-describing package 
> for shipment across a network." statement. So I can understand how supporting 
> a large number of columns might be inappropriate when you want to run 
> complicated SQL queries on data, but to just store data and then extract all 
> rows to do some data processing, Or as the most complicated query it would be 
> to extract just a subsample of rows. But not really do to any JOIN queries or 
> something like that. it looks like except for artificial limit in SQLite, 
> because it is not useful for general case, there is no other reason why it 
> could not be supported.
>
> So why not increase the limit to 2 billion, and have it at runtime by default 
> limited to 2000. And then using PRAGMA one could increase this if needed to 2 
> billion? PRAGMA already can decrease the limit, so we can keep the existing 
> 2000 limit, but to support it without having to recompile, people could 
> increase it all the way to 2 billion. Is there any significant performance 
> downside to this?
>
> [1] https://www.sqlite.org/whentouse.html
>
>
> Mitar
>
> On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp  wrote:
> >
> > SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> > support up to 2 billion columns.  But having a relation with a large
> > number of columns seems like a very bad idea stylistically.  That's
> > not how relational databases are intended to be used.  Normally when a
> > table acquires more than a couple dozen columns, that is a good
> > indication that you need normalize and/or refactor your schema. Schema
> > designers almost unanimously follow that design principle.  And so
> > SQLite is optimized for the overwhelmingly common case of a small
> > number of columns per table.
> >
> > Hence, adding the ability to have a table with a huge number of
> > columns is not something that I am interested in supporting in SQLite
> > at this time.
> >
> > --
> > D. Richard Hipp
> > d...@sqlite.org
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | 
> Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) 
> +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


  1   2   3   4   5   6   7   8   9   10   >