Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub
Le mercredi 3 février 2016, 10:47:45 Stephen Paul Weber a écrit : > > But I have to say that MAM is really badly adapted to PubSub > > I'm curious why one would *want* to use MAM with PubSub, since PubSub > already specifies a way of storing and fetching items? For the filtering capabilities (i.e. searching in a pubsub node). In SàT we use it to look for items corresponding to an atom category. Actually we have talked about that in the summit, and MAM could do it because: - the MUST in "The archive results MUST be sorted in chronological order" can be changed if an other XEP says so - other XEPs could probably change the search on several nodes or this kind of points I have raised. I'm still concerned about the overhead with putting everything is message, even if the stanza size issue raised by Kev is a good point. But this is maybe fixable also with an additional XEP. ++ Goffi ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub
Hi Kev, thanks for your answer, I put a few notes here so we can talk about it if needed tomorrow. Le dimanche 24 janvier 2016, 17:25:44 Kevin Smith a écrit : > On 6 Jan 2016, at 11:08, Goffiwrote: > > - All items a returned in separate stanza, wrapped in a > > element, one item per stanza. This both is a waste of > > bandwidth and make the task more difficult for the client as it must > > track each and the result to known when a page has been > > received. A simple query like for a PubSub items retrieval would be > > much more better. > > Aren’t you going to have huge troubles with stanza sizes in that case? It > seems like once you start wrapping multiple pubsub items together you’re > going to start exceeding stanza sizes and needing to deal with the code for > merging them anyway. That's actually what PubSub itself do, so if we have issue with stanza size, we can start to worry about XEP-0060. > > - Requests are made on one node. But it is desirable to be able to do > > requests on several nodes, or on nodes which match a pattern. For > > instance, in XEP-0277 comments node are in the form > > "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it > > would be usefull to request all items from a node starting with > > "urn:xmpp:microblog:0:comments". With MAM I can't request all comments > > published by Romeo. > > I think that’s a fairly simple extension for someone to spec, isn’t it? MAM request detect if it is a pubsub request by checking the node attribute. A wild card could be used for the use case I have given. But what if I want to look several nodes ? Or ignore the node ? We can always write XEPs to workaround this, but it can quickly complicates the request. > > - There is no way when a service offer MAM both for message and PubSub > > (e.g.: a MUC component with PubSub abilities (MUC 2 ?), or the server > > itself when it offers PEP) to know if the filtering fields apply to > > messages, or PubSub, or both. > > Look at section 4.1.5 "Retrieving form fields", how can I know if > > "urn:example:xmpp:free-text-search" can be used for PubSub or not? > > I imagine you request the form for the node you’re interested in querying. > If that’s not clear, we should make it so. but we go back to our problem with querying multiple nodes at once, or nodes starting with a namespace. > > - section 4.2 says that "The archive results MUST be sorted in > > chronological order", that totally make sense for message archives, but > > in the case of PubSub this is incoherent with the classic items retrieval > > ordering (most recent item first), and we may want to sort on other > > fields than publication date: for instance item updating date vs > > publishing date, or size of files tracked with pubsub. > > Of course we can reverse order easily with RSM, but though it's not > > natural, and we can't sort on other fields. > > This doesn’t seem insurmountable. We have data forms for the queries if we > want to change behaviour. If the MUST disappear, this one is easily fixable indeed > > - overall, PubSub already manages archives by design, but it is lacking a > > good searching tool. Even if it is tempting to use MAM with PubSub > > because we can have filtering "for free", I really think it is not > > adapted at all, and PubSub deserve a real dedicated searching/filtering > > tool. > > I would be very keen to move towards one method for doing history queries > and not having the current plethora (offline messages, MUC context, PubSub, > …). > > If other people are interested, I would like to work on a "PubSub > > searching" protoXEP. PubSub will probably be the core of many major > > features in XMPP in the future, so we need a good, generic, and > > extendable way to search/filter items. > > I think the effort would be much better spent adding MAM extensions as > necessary. I'm also thinking about way to do complex queries (with AND/OR filtering), and I don't have the feeling it's a goal for MAM. But again this can be fixable by an other XEP. My two main grievances are about the items returned in stanzas and the impossibility to query multiple nodes or nodes with a wildcard. If these two are fixed, I guess MAM can start to be a better option. > > /K Goffi ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub
On 6 Jan 2016, at 11:08, Goffiwrote: > - All items a returned in separate stanza, wrapped in a > element, one item per stanza. This both is a waste of bandwidth and make the > task more difficult for the client as it must track each and the > > result to known when a page has been received. A simple query like for a > PubSub items retrieval would be much more better. Aren’t you going to have huge troubles with stanza sizes in that case? It seems like once you start wrapping multiple pubsub items together you’re going to start exceeding stanza sizes and needing to deal with the code for merging them anyway. > - Requests are made on one node. But it is desirable to be able to do > requests > on several nodes, or on nodes which match a pattern. For instance, in > XEP-0277 > comments node are in the form > "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would > be usefull to request all items from a node starting with > "urn:xmpp:microblog:0:comments". With MAM I can't request all comments > published by Romeo. I think that’s a fairly simple extension for someone to spec, isn’t it? > - this one could be easily fixed, but currently we can't do filtering on PEP > without requesting a particular jid. With microblog, we want to be able to > request e.g. all items with the category/tag "XMPP" regardless of the author. Same. > - There is no way when a service offer MAM both for message and PubSub (e.g.: > a > MUC component with PubSub abilities (MUC 2 ?), or the server itself when it > offers PEP) to know if the filtering fields apply to messages, or PubSub, or > both. > Look at section 4.1.5 "Retrieving form fields", how can I know if > "urn:example:xmpp:free-text-search" can be used for PubSub or not? I imagine you request the form for the node you’re interested in querying. If that’s not clear, we should make it so. > - section 4.2 says that "The archive results MUST be sorted in chronological > order", that totally make sense for message archives, but in the case of > PubSub this is incoherent with the classic items retrieval ordering (most > recent item first), and we may want to sort on other fields than publication > date: for instance item updating date vs publishing date, or size of files > tracked with pubsub. > Of course we can reverse order easily with RSM, but though it's not natural, > and we can't sort on other fields. This doesn’t seem insurmountable. We have data forms for the queries if we want to change behaviour. > - overall, PubSub already manages archives by design, but it is lacking a > good > searching tool. Even if it is tempting to use MAM with PubSub because we can > have filtering "for free", I really think it is not adapted at all, and > PubSub > deserve a real dedicated searching/filtering tool. I would be very keen to move towards one method for doing history queries and not having the current plethora (offline messages, MUC context, PubSub, …). > If other people are interested, I would like to work on a "PubSub searching" > protoXEP. PubSub will probably be the core of many major features in XMPP in > the future, so we need a good, generic, and extendable way to search/filter > items. I think the effort would be much better spent adding MAM extensions as necessary. /K ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
[Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub
G'day, MAM is a great tool which solves several problems for messages management. It also offers the ability to get items from a PubSub node when the "node" attribute is used. We have implemented this feature in our PubSub/PEP component, and I haven't seen any other implementation for PubSub so far (if you know any, please tell me). But I have to say that MAM is really badly adapted to PubSub, here are the major reasons: - All items a returned in separate stanza, wrapped in a element, one item per stanza. This both is a waste of bandwidth and make the task more difficult for the client as it must track each and the result to known when a page has been received. A simple query like for a PubSub items retrieval would be much more better. - Requests are made on one node. But it is desirable to be able to do requests on several nodes, or on nodes which match a pattern. For instance, in XEP-0277 comments node are in the form "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would be usefull to request all items from a node starting with "urn:xmpp:microblog:0:comments". With MAM I can't request all comments published by Romeo. - this one could be easily fixed, but currently we can't do filtering on PEP without requesting a particular jid. With microblog, we want to be able to request e.g. all items with the category/tag "XMPP" regardless of the author. - There is no way when a service offer MAM both for message and PubSub (e.g.: a MUC component with PubSub abilities (MUC 2 ?), or the server itself when it offers PEP) to know if the filtering fields apply to messages, or PubSub, or both. Look at section 4.1.5 "Retrieving form fields", how can I know if "urn:example:xmpp:free-text-search" can be used for PubSub or not? - section 4.2 says that "The archive results MUST be sorted in chronological order", that totally make sense for message archives, but in the case of PubSub this is incoherent with the classic items retrieval ordering (most recent item first), and we may want to sort on other fields than publication date: for instance item updating date vs publishing date, or size of files tracked with pubsub. Of course we can reverse order easily with RSM, but though it's not natural, and we can't sort on other fields. - overall, PubSub already manages archives by design, but it is lacking a good searching tool. Even if it is tempting to use MAM with PubSub because we can have filtering "for free", I really think it is not adapted at all, and PubSub deserve a real dedicated searching/filtering tool. If other people are interested, I would like to work on a "PubSub searching" protoXEP. PubSub will probably be the core of many major features in XMPP in the future, so we need a good, generic, and extendable way to search/filter items. Regards Goffi ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] XEP-0313: why it is *really* not a good idea to use MAM with Pubsub
Hello, I think it is a good idea for there to be a search extension for pubsub. One thing to keep in mind would be is that the extension could become really complicated depending on the search fields that you are going to have and the type of filter you want. If there are user defined fields, then special care should be had not to make the stanza cumbersome. An idea you can implement for instance, to make it more compact, is to use fields defined in other namespaces, so in essence you'd be searching a particular XEP namespace (for exampleurn:xmpp:jingle:apps:file-transfer:4) and you could reuse to fields instead of redefining then. Of course you would still need to have user defined fields as I don't think that searching namespaces would handle all use cases. Good luck, For reference you can have a look at: XEP-055 On 06/01/16 07:08, Goffi wrote: G'day, MAM is a great tool which solves several problems for messages management. It also offers the ability to get items from a PubSub node when the "node" attribute is used. We have implemented this feature in our PubSub/PEP component, and I haven't seen any other implementation for PubSub so far (if you know any, please tell me). But I have to say that MAM is really badly adapted to PubSub, here are the major reasons: - All items a returned in separate stanza, wrapped in a element, one item per stanza. This both is a waste of bandwidth and make the task more difficult for the client as it must track each and the result to known when a page has been received. A simple query like for a PubSub items retrieval would be much more better. - Requests are made on one node. But it is desirable to be able to do requests on several nodes, or on nodes which match a pattern. For instance, in XEP-0277 comments node are in the form "urn:xmpp:microblog:0:comments/dd88c9bc58886fce0049ed050df0c5f2" and it would be usefull to request all items from a node starting with "urn:xmpp:microblog:0:comments". With MAM I can't request all comments published by Romeo. - this one could be easily fixed, but currently we can't do filtering on PEP without requesting a particular jid. With microblog, we want to be able to request e.g. all items with the category/tag "XMPP" regardless of the author. - There is no way when a service offer MAM both for message and PubSub (e.g.: a MUC component with PubSub abilities (MUC 2 ?), or the server itself when it offers PEP) to know if the filtering fields apply to messages, or PubSub, or both. Look at section 4.1.5 "Retrieving form fields", how can I know if "urn:example:xmpp:free-text-search" can be used for PubSub or not? - section 4.2 says that "The archive results MUST be sorted in chronological order", that totally make sense for message archives, but in the case of PubSub this is incoherent with the classic items retrieval ordering (most recent item first), and we may want to sort on other fields than publication date: for instance item updating date vs publishing date, or size of files tracked with pubsub. Of course we can reverse order easily with RSM, but though it's not natural, and we can't sort on other fields. - overall, PubSub already manages archives by design, but it is lacking a good searching tool. Even if it is tempting to use MAM with PubSub because we can have filtering "for free", I really think it is not adapted at all, and PubSub deserve a real dedicated searching/filtering tool. If other people are interested, I would like to work on a "PubSub searching" protoXEP. PubSub will probably be the core of many major features in XMPP in the future, so we need a good, generic, and extendable way to search/filter items. Regards Goffi ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___ ___ Standards mailing list Info: http://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___