Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub

2016-02-04 Thread Goffi
Le mercredi 3 février 2016, 10:47:45 Stephen Paul Weber a écrit :
> > But I have to say that MAM is really badly adapted to PubSub
> 
> I'm curious why one would *want* to use MAM with PubSub, since PubSub
> already specifies a way of storing and fetching items?


For the filtering capabilities (i.e. searching in a pubsub node). In SàT we use 
it to look for items corresponding to an atom category.

Actually we have talked about that in the summit, and MAM could do it because:

- the MUST in  "The archive results MUST be sorted in chronological 
order" can be changed if an other XEP says so

- other XEPs could probably change the search on several nodes or this kind of 
points I have raised.

I'm still concerned about the overhead with putting everything is message, 
even if the stanza size issue raised by Kev is a good point. But this is maybe 
fixable also with an additional XEP.

++
Goffi
___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub

2016-01-27 Thread Goffi
Hi Kev,

thanks for your answer, I put a few notes here so we can talk about it if 
needed tomorrow.

Le dimanche 24 janvier 2016, 17:25:44 Kevin Smith a écrit :
> On 6 Jan 2016, at 11:08, Goffi  wrote:
> > - All items a returned in separate  stanza, wrapped in a
> >  element, one item per stanza. This both is a waste of
> > bandwidth and make the task more difficult for the client as it must
> > track each  and the  result to known when a page has been
> > received. A simple  query like for a PubSub items retrieval would be
> > much more better.
> 
> Aren’t you going to have huge troubles with stanza sizes in that case? It
> seems like once you start wrapping multiple pubsub items together you’re
> going to start exceeding stanza sizes and needing to deal with the code for
> merging them anyway.

That's actually what PubSub itself do, so if we have issue with stanza size, 
we can start to worry about XEP-0060.


> > - Requests are made on one node. But it is desirable to be able to do
> > requests on several nodes, or on nodes which match a pattern. For
> > instance, in XEP-0277 comments node are in the form
> > "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it
> > would be usefull to request all items from a node starting with
> > "urn:xmpp:microblog:0:comments". With MAM I can't request all comments
> > published by Romeo.
> 
> I think that’s a fairly simple extension for someone to spec, isn’t it?

MAM request detect if it is a pubsub request by checking the node attribute.
A wild card could be used for the use case I have given. But what if I want to 
look several  nodes ? Or ignore the node ? We can always write XEPs to 
workaround this, but it can quickly complicates the request.

> > - There is no way when a service offer MAM both for message and PubSub
> > (e.g.: a MUC component with PubSub abilities (MUC 2 ?), or the server
> > itself when it offers PEP) to know if the filtering fields apply to
> > messages, or PubSub, or both.
> > Look at section 4.1.5 "Retrieving form fields", how can I know if
> > "urn:example:xmpp:free-text-search" can be used for PubSub or not?
> 
> I imagine you request the form for the node you’re interested in querying.
> If that’s not clear, we should make it so.

but we go back to our problem with querying multiple nodes at once, or nodes 
starting with a namespace.

> > - section 4.2 says that "The archive results MUST be sorted in
> > chronological order", that totally make sense for message archives, but
> > in the case of PubSub this is incoherent with the classic items retrieval
> > ordering (most recent item first), and we may want to sort on other
> > fields than publication date: for instance item updating date vs
> > publishing date, or size of files tracked with pubsub.
> > Of course we can reverse order easily with RSM, but though it's not
> > natural, and we can't sort on other fields.
> 
> This doesn’t seem insurmountable. We have data forms for the queries if we
> want to change behaviour.

If the MUST disappear, this one is easily fixable indeed

> > - overall, PubSub already manages archives by design, but it is lacking a
> > good searching tool. Even if it is tempting to use MAM with PubSub
> > because we can have filtering "for free", I really think it is not
> > adapted at all, and PubSub deserve a real dedicated searching/filtering
> > tool.
> 
> I would be very keen to move towards one method for doing history queries
> and not having the current plethora (offline messages, MUC context, PubSub,
> …).
> > If other people are interested, I would like to work on a "PubSub
> > searching" protoXEP. PubSub will probably be the core of many major
> > features in XMPP in the future, so we need a good, generic, and
> > extendable way to search/filter items.
> 
> I think the effort would be much better spent adding MAM extensions as
> necessary.

I'm also thinking about way to do complex queries (with AND/OR filtering), and 
I don't have the feeling it's a goal for MAM. But again this can be fixable by 
an other XEP.
My two main grievances are about the items returned in  stanzas and 
the impossibility to query multiple nodes or nodes with a wildcard. If these 
two are fixed, I guess MAM can start to be a better option.

> 
> /K

Goffi
___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub

2016-01-24 Thread Kevin Smith
On 6 Jan 2016, at 11:08, Goffi  wrote:
> - All items a returned in separate  stanza, wrapped in a  
> element, one item per stanza. This both is a waste of bandwidth and make the 
> task more difficult for the client as it must track each  and the 
>  
> result to known when a page has been received. A simple  query like for a 
> PubSub items retrieval would be much more better.

Aren’t you going to have huge troubles with stanza sizes in that case? It seems 
like once you start wrapping multiple pubsub items together you’re going to 
start exceeding stanza sizes and needing to deal with the code for merging them 
anyway.

> - Requests are made on one node. But it is desirable to be able to do 
> requests 
> on several nodes, or on nodes which match a pattern. For instance, in 
> XEP-0277 
> comments node are in the form 
> "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would 
> be usefull to request all items from a node starting with 
> "urn:xmpp:microblog:0:comments". With MAM I can't request all comments 
> published by Romeo.

I think that’s a fairly simple extension for someone to spec, isn’t it?

> - this one could be easily fixed, but currently we can't do filtering on PEP 
> without requesting a particular jid. With microblog, we want to be able to 
> request e.g. all items with the category/tag "XMPP" regardless of the author.

Same.

> - There is no way when a service offer MAM both for message and PubSub (e.g.: 
> a 
> MUC component with PubSub abilities (MUC 2 ?), or the server itself when it 
> offers PEP) to know if the filtering fields apply to messages, or PubSub, or 
> both.
> Look at section 4.1.5 "Retrieving form fields", how can I know if 
> "urn:example:xmpp:free-text-search" can be used for PubSub or not?

I imagine you request the form for the node you’re interested in querying. If 
that’s not clear, we should make it so.

> - section 4.2 says that "The archive results MUST be sorted in chronological 
> order", that totally make sense for message archives, but in the case of 
> PubSub this is incoherent with the classic items retrieval ordering (most 
> recent item first), and we may want to sort on other fields than publication 
> date: for instance item updating date vs publishing date, or size of files 
> tracked with pubsub.
> Of course we can reverse order easily with RSM, but though it's not natural, 
> and we can't sort on other fields.

This doesn’t seem insurmountable. We have data forms for the queries if we want 
to change behaviour.

> - overall, PubSub already manages archives by design, but it is lacking a 
> good 
> searching tool. Even if it is tempting to use MAM with PubSub because we can 
> have filtering "for free", I really think it is not adapted at all, and 
> PubSub 
> deserve a real dedicated searching/filtering tool.

I would be very keen to move towards one method for doing history queries and 
not having the current plethora (offline messages, MUC context, PubSub, …).

> If other people are interested, I would like to work on a "PubSub searching" 
> protoXEP. PubSub will probably be the core of many major features in XMPP in 
> the future, so we need a good, generic, and extendable way to search/filter 
> items.

I think the effort would be much better spent adding MAM extensions as 
necessary.

/K
___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


[Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub

2016-01-06 Thread Goffi
G'day,

MAM is a great tool which solves several problems for messages management. It 
also offers the ability to get items from a PubSub node when the "node" 
attribute is used.

We have implemented this feature in our PubSub/PEP component, and I haven't 
seen any other implementation for PubSub so far (if you know any, please tell 
me).

But I have to say that MAM is really badly adapted to PubSub, here are the 
major reasons:

- All items a returned in separate  stanza, wrapped in a  
element, one item per stanza. This both is a waste of bandwidth and make the 
task more difficult for the client as it must track each  and the  
result to known when a page has been received. A simple  query like for a 
PubSub items retrieval would be much more better.

- Requests are made on one node. But it is desirable to be able to do requests 
on several nodes, or on nodes which match a pattern. For instance, in XEP-0277 
comments node are in the form 
"urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would 
be usefull to request all items from a node starting with 
"urn:xmpp:microblog:0:comments". With MAM I can't request all comments 
published by Romeo.

- this one could be easily fixed, but currently we can't do filtering on PEP 
without requesting a particular jid. With microblog, we want to be able to 
request e.g. all items with the category/tag "XMPP" regardless of the author.

- There is no way when a service offer MAM both for message and PubSub (e.g.: a 
MUC component with PubSub abilities (MUC 2 ?), or the server itself when it 
offers PEP) to know if the filtering fields apply to messages, or PubSub, or 
both.
Look at section 4.1.5 "Retrieving form fields", how can I know if 
"urn:example:xmpp:free-text-search" can be used for PubSub or not?

- section 4.2 says that "The archive results MUST be sorted in chronological 
order", that totally make sense for message archives, but in the case of 
PubSub this is incoherent with the classic items retrieval ordering (most 
recent item first), and we may want to sort on other fields than publication 
date: for instance item updating date vs publishing date, or size of files 
tracked with pubsub.
Of course we can reverse order easily with RSM, but though it's not natural, 
and we can't sort on other fields.

- overall, PubSub already manages archives by design, but it is lacking a good 
searching tool. Even if it is tempting to use MAM with PubSub because we can 
have filtering "for free", I really think it is not adapted at all, and PubSub 
deserve a real dedicated searching/filtering tool.

If other people are interested, I would like to work on a "PubSub searching" 
protoXEP. PubSub will probably be the core of many major features in XMPP in 
the future, so we need a good, generic, and extendable way to search/filter 
items.

Regards
Goffi

___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub

2016-01-06 Thread Jefry Lagrange

Hello,

I think it is a good idea for there to be a search extension for pubsub.

One thing to keep in mind would be is that the extension could become 
really complicated depending on the search fields that you are going to 
have and the type of filter you want. If there are user defined fields, 
then special care should be had not to make the stanza cumbersome.


An idea you can implement for instance, to make it more compact, is to 
use fields defined in other namespaces, so in essence you'd be searching 
a particular XEP namespace (for 
exampleurn:xmpp:jingle:apps:file-transfer:4) and you could reuse to 
fields instead of redefining then. Of course you would still need to 
have user defined fields as I don't think that searching namespaces 
would handle all use cases.


Good luck,

For reference you can have a look at: XEP-055

On 06/01/16 07:08, Goffi wrote:

G'day,

MAM is a great tool which solves several problems for messages management. It
also offers the ability to get items from a PubSub node when the "node"
attribute is used.

We have implemented this feature in our PubSub/PEP component, and I haven't
seen any other implementation for PubSub so far (if you know any, please tell
me).

But I have to say that MAM is really badly adapted to PubSub, here are the
major reasons:

- All items a returned in separate  stanza, wrapped in a 
element, one item per stanza. This both is a waste of bandwidth and make the
task more difficult for the client as it must track each  and the 
result to known when a page has been received. A simple  query like for a
PubSub items retrieval would be much more better.

- Requests are made on one node. But it is desirable to be able to do requests
on several nodes, or on nodes which match a pattern. For instance, in XEP-0277
comments node are in the form
"urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would
be usefull to request all items from a node starting with
"urn:xmpp:microblog:0:comments". With MAM I can't request all comments
published by Romeo.

- this one could be easily fixed, but currently we can't do filtering on PEP
without requesting a particular jid. With microblog, we want to be able to
request e.g. all items with the category/tag "XMPP" regardless of the author.

- There is no way when a service offer MAM both for message and PubSub (e.g.: a
MUC component with PubSub abilities (MUC 2 ?), or the server itself when it
offers PEP) to know if the filtering fields apply to messages, or PubSub, or
both.
Look at section 4.1.5 "Retrieving form fields", how can I know if
"urn:example:xmpp:free-text-search" can be used for PubSub or not?

- section 4.2 says that "The archive results MUST be sorted in chronological
order", that totally make sense for message archives, but in the case of
PubSub this is incoherent with the classic items retrieval ordering (most
recent item first), and we may want to sort on other fields than publication
date: for instance item updating date vs publishing date, or size of files
tracked with pubsub.
Of course we can reverse order easily with RSM, but though it's not natural,
and we can't sort on other fields.

- overall, PubSub already manages archives by design, but it is lacking a good
searching tool. Even if it is tempting to use MAM with PubSub because we can
have filtering "for free", I really think it is not adapted at all, and PubSub
deserve a real dedicated searching/filtering tool.

If other people are interested, I would like to work on a "PubSub searching"
protoXEP. PubSub will probably be the core of many major features in XMPP in
the future, so we need a good, generic, and extendable way to search/filter
items.

Regards
Goffi

___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


___
Standards mailing list
Info: http://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___